Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.7k stars 312 forks source link

Constraint error when tuning with AutoNHITS #957

Closed zzzrbx closed 1 month ago

zzzrbx commented 3 months ago

What happened + What you expected to happen

When tuning AutoNHITS with Optuna backend (without using Ray Tune) I occasionally get this error:

neuralforecast/lib64/python3.8/site-packages/torch/distributions/distribution.py", line 68, in __init__
    raise ValueError(
ValueError: Expected parameter df (Tensor of shape (1024, 31)) of distribution Chi2() to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], grad_fn=<MulBackward0>)
[W 2024-04-07 20:31:41,964] Trial 5 failed with value None.

I'm using the latest version with no Ray Tune support.

Versions / Dependencies

Neuralforecast 1.7.0 Python 3

Reproduction script

It'd be difficult to share because it does not happen always

Issue Severity

High: It blocks me from completing my task.

jmoralez commented 3 months ago

Hey @zzzrbx, thanks for using neuralforecast. This isn't a shape error, the check that is failing is that it expects the values to be non-negative and they're all NaNs. Are you using scalers?

zzzrbx commented 3 months ago

Yes i’m using the robust scaler

On Mon, 8 Apr 2024 at 17:46, José Morales @.***> wrote:

Hey @zzzrbx https://github.com/zzzrbx, thanks for using neuralforecast. This isn't a shape error, the check that is failing is that it expects the values to be non-negative and they're all NaNs. Are you using scalers?

— Reply to this email directly, view it on GitHub https://github.com/Nixtla/neuralforecast/issues/957#issuecomment-2043217279, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFREVQ5YYBL57YFFSW25KMDY4LCYDAVCNFSM6AAAAABF3UEHNKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBTGIYTOMRXHE . You are receiving this because you were mentioned.Message ID: @.***>

jmoralez commented 3 months ago

Is it on the model (batches) or in the NeuralForecast constructor? If you can at least provide the code you're running that'd help a lot.

zzzrbx commented 3 months ago

I'm using the code below, basically taken from the documentation. The error comes up more often as I increase the number of trials in optuna (I'm using CPU with no ray tune). Should the time series have a minimum length?

def config_nhits(trial):
    return {
        'futr_exog_list': futr_exog_list,
        'hist_exog_list': hist_exog_list,
        'max_steps': trial.suggest_int("max_steps", 100, 300),                                                                                          
        'input_size': h,         
        'activation':'ReLU',
        'scaler_type':'robust',
        'pooling_mode': 'AvgPool1d',
        'learning_rate': trial.suggest_loguniform("learning_rate", 1e-5, 1e-1),                                         
        'n_pool_kernel_size': trial.suggest_categorical("n_pool_kernel_size", [[2, 2, 2], [16, 8, 1]]),                 
        'n_freq_downsample': trial.suggest_categorical("n_freq_downsample", [[168, 24, 1], [24, 12, 1], [1, 1, 1]]),    
        'batch_size': trial.suggest_int("batch_size", 8, 16, 32), 
        'inference_windows_batch_size': 1,
        'random_seed': trial.suggest_int("random_seed", 1, 10),   
        'val_check_steps': 10,
    }

models = [
    AutoNHITS(
        h=h,
#            loss=DistributionLoss(distribution='StudentT', level=[80, 90], return_params=True),
        config=config_nhits,
        search_alg=optuna.samplers.TPESampler(),
        backend='optuna',
        num_samples=50,
        cpus=20,
    )    
]
elephaint commented 2 months ago

I see you commented the DistributionLoss - what distribution are you using when you see the error? And, does the error also occur you change the distribution type?

zzzrbx commented 2 months ago

I’m using Student-t most of the time but it happens with tweedie as well. Haven’t tested other distributions yet

On Thu, 11 Apr 2024 at 11:27, Olivier Sprangers @.***> wrote:

I see you commented the DistributionLoss - what distribution are you using when you see the error? And, does the error also occur you change the distribution type?

— Reply to this email directly, view it on GitHub https://github.com/Nixtla/neuralforecast/issues/957#issuecomment-2049387266, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFREVQ25NI3DGN2RS35XS7LY4ZQRHAVCNFSM6AAAAABF3UEHNKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBZGM4DOMRWGY . You are receiving this because you were mentioned.Message ID: @.***>

elephaint commented 2 months ago

Can you test with a Normal distribution? I want to exclude the possibility of this being an issue related to the distributions.

Also, the initial error you showed above is (I think) when running the Student-t. What is the exact error you get with the Tweedie?

elephaint commented 2 months ago

Hey @zzzrbx just checking in - did you have any luck trying out with the Normal distribution?

github-actions[bot] commented 1 month ago

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one.

fariduca commented 1 month ago

I am facing the same problem when trying to use Tweedie distribution. Both when scaling on the model (batches) and using local_scaler. I have tried 'standard', 'robust', and 'boxcox' scaler types. Still ends up with the same problem. The code I am running

autonhits = AutoNHITS(
    h=HORIZON, 
    loss=DistributionLoss('Tweedie', level=[75], rho=1.5), 
    num_samples=25,
    config=NHITS_objective_optuna,
    backend='optuna'
)  

 nf = NeuralForecast(
      models=[autonhits],    # AutoNHITS
      freq='MS',
      local_scaler_type=LOCAL_SCALER_TYPE,
  )

  nf.cross_validation(df=ts_train_df, verbose=False, step_size=HORIZON, refit=True, 
                      val_size=HORIZON)

Here is the error I get

File c:\...\torch\distributions\distribution.py:68, in Distribution.__init__(self, batch_shape, event_shape, validate_args)
     [66](file:///.../torch/distributions/distribution.py:66)         valid = constraint.check(value)
     [67](file:///.../torch/distributions/distribution.py:67)         if not valid.all():
---> [68](file:///.../torch/distributions/distribution.py:68)             raise ValueError(
     [69](file:///.../torch/distributions/distribution.py:69)                 f"Expected parameter {param} "
     [70](file:///.../torch/distributions/distribution.py:70)                 f"({type(value).__name__} of shape {tuple(value.shape)}) "
     [71](file:///.../torch/distributions/distribution.py:71)                 f"of distribution {repr(self)} "
     [72](file:///.../torch/distributions/distribution.py:72)                 f"to satisfy the constraint {repr(constraint)}, "
     [73](file:///.../torch/distributions/distribution.py:73)                 f"but found invalid values:\n{value}"
     [74](file:///.../torch/distributions/distribution.py:74)             )
     [75](file:///.../torch/distributions/distribution.py:75) super().__init__()

ValueError: Expected parameter concentration (Tensor of shape (1000, 128, 3)) of distribution Gamma(concentration: torch.Size([1000, 128, 3]), rate: torch.Size([1000, 128, 3])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         ...,
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         ...,
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         ...,
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        ...,

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         ...,
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         ...,
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         ...,
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]], device='cuda:0')
elephaint commented 1 month ago

@fariduca I think some of our bounds are too tight for the distributions, this seems the same issue as you are experiencing. I am working on fixing this.

github-actions[bot] commented 1 month ago

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one.

ll550 commented 2 weeks ago

waiting for the solution. This kind of error is totally beyond my scope.:)