Exogenous variables normalization

candalfigomoro commented 1 year ago

When I set the scaler_type parameter, does it also normalize exogenous variables?

If not, should we normalize them? For each time series (so for each unique_id) or globally across the whole dataframe?

Thanks @kdgutier

kdgutier commented 1 year ago

Hi @candalfigomoro, Thanks for using the library.

When I set the scaler_type parameter, does it also normalize exogenous variables?

Yes the scaler normalizes the exogenous variables (except statics). TemporalNorm operates at the unique_id level.

Neural network's non-linearities tend to struggle with widely varying time series' scales. For this reason we added normalization capabilities built within the architectures, in this link we explain the available normalization strategies: https://nixtla.github.io/neuralforecast/common.scalers.html

You may also do your own normalization process outside of the fit, predict methods.

candalfigomoro commented 1 year ago

@kdgutier

Thanks!

When I use the Poisson loss (loss=DistributionLoss(distribution='Poisson', level=[80, 90], return_params=False)) I can't really normalize the y target (since it is "count data"). Is there a way to only normalize exogenous variables?

kdgutier commented 1 year ago

Hi @candalfigomoro,

You can use NeuralForecast's builtin scaler_type initialization argument, that simultaneously normalize the inputs and train against the original y data.

Here is the scalers documentation:

https://nixtla.github.io/neuralforecast/common.scalers.html

Here is an example where we fit zero-inflated count data:

https://nixtla.github.io/neuralforecast/examples/intermittentdata.html

Let us know if this solves your problem. If using scaler_type still faces your problem would you be able to share the errors and a pipeline example?

candalfigomoro commented 1 year ago

Hi @kdgutier,

in my previous message I was not very clear. This is the problem:

I have a dataset with the unique_id, ds and y columns (and some exogenous variables). Unfortunately, I can't share the dataset. Values in y are >= 0 (they are count data).

I'm trying these models:

models = [
    NHITS(
        h=horizon,
        input_size=2 * horizon,
        scaler_type='robust',
        futr_exog_list=futr_exog_list,
        #stat_exog_list=stat_exog_list,
        max_steps=100,
        loss=DistributionLoss(distribution='Poisson', level=[80, 90], return_params=True),
    ),
    NBEATSx(
        h=horizon,
        input_size=2 * horizon,
        scaler_type='robust',
        futr_exog_list=futr_exog_list,
        #stat_exog_list=stat_exog_list,
        max_steps=100,
        loss=MAE(),
    ),
]

Let's focus on the NHITS model (notice scaler_type='robust').

When I use a loss function such as MAE or MSE, it works.

When I switch the loss to DistributionLoss(distribution='Poisson', level=[80, 90], return_params=True) I get the following error when I call fit():

/opt/conda/lib/python3.7/site-packages/torch/distributions/distribution.py in _validate_sample(self, value)
    292         if not valid.all():
    293             raise ValueError(
--> 294                 "Expected value argument "
    295                 f"({type(value).__name__} of shape {tuple(value.shape)}) "
    296                 f"to be within the support ({repr(support)}) "

ValueError: Expected value argument (Tensor of shape (1024, 12)) to be within the support (IntegerGreaterThan(lower_bound=0)) of the distribution Poisson(rate: torch.Size([1024, 12])), but found invalid values:
tensor([[ 7.,  2.,  4.,  ...,  4.,  1.,  4.],
        [ 5.,  1.,  2.,  ...,  0.,  0.,  0.],
        [ 4.,  3.,  1.,  ...,  0.,  0.,  0.],
        ...,
        [ 0.,  0.,  0.,  ...,  0.,  0.,  0.],
        [27., 38.,  0.,  ...,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  ...,  0.,  2.,  0.]])

However, the error disappears when I remove scaler_type='robust'. It looks like the scaler is generating some negative value when it shouldn't (the minmax scaler is also not working).

The funny thing is that, If I also remove futr_exog_list, I get this other error after about 5 mini-batches of training:

/opt/conda/lib/python3.7/site-packages/torch/distributions/distribution.py in __init__(self, batch_shape, event_shape, validate_args)
     54                 if not valid.all():
     55                     raise ValueError(
---> 56                         f"Expected parameter {param} "
     57                         f"({type(value).__name__} of shape {tuple(value.shape)}) "
     58                         f"of distribution {repr(self)} "

ValueError: Expected parameter rate (Tensor of shape (1024, 12)) of distribution Poisson(rate: torch.Size([1024, 12])) to satisfy the constraint GreaterThanEq(lower_bound=0.0), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], grad_fn=<SoftplusBackward0>)

So, basically, the only working combination to make the Poisson loss work is:

Don't use any scaler (otherwise I get the the IntegerGreaterThan(lower_bound=0) error)
Include some exogenous variable (otherwise I get tensors with NaN after some mini-batches of training)

In this way, exogenous variables are obviously not normalized.

I don't have these issues if I use the MAE() or MSE() loss.

Thanks

candalfigomoro commented 1 year ago

@kdgutier

Decreasing the learning rate helps with the all-NaN tensors (exploding gradients?). Still, there's the problem with scaling and the IntegerGreaterThan(lower_bound=0) error.

P.S. Changing activation to "Tanh" also helps with the all-NaN tensors issue.

kdgutier commented 1 year ago

Hi @candalfigomoro,

Glad to know that reducing the learning rate helped with the exploding gradients.

After changing the activation and the learning rate is the NHITS+Poisson giving you reasonable results?
Or do we still need to check carefully the interactions between scaler_type and Poisson distribution?

candalfigomoro commented 1 year ago

@kdgutier

Scaling data improves results when using the MSE loss, unfortunately I'm not able to use scaling with the Poisson loss. I'm still not sure what's going on.

kdgutier commented 1 year ago

Thanks for reporting this @candalfigomoro, I will take a careful look to it and reach you back.

Meanwhile if you may try other type of distribution loss functions, the GMM and MQLoss have better numerical stability: https://nixtla.github.io/neuralforecast/losses.pytorch.html#multi-quantile-loss-mqloss

kdgutier commented 1 year ago

Hey @candalfigomoro,

I went through the numerical stability of the Poisson distribution. Here is a working example: https://github.com/Nixtla/neuralforecast/blob/main/nbs/examples/HierarchicalNetworks.ipynb

Would you be able to see if the improvements help in your data?

candalfigomoro commented 1 year ago

@kdgutier Thank you very much!

Using the Poisson distribution seems more stable now (unfortunately it doesn't seem to provide good results with my forecasting problem, so I'll try to play with other losses such as the Negative Binomial).

kdgutier commented 1 year ago

Good to know, and thanks for reporting Poisson's numerical instability. We recently included also the Tweedie distribution that is specialized in zero inflated variables.

Nixtla / neuralforecast

Exogenous variables normalization #464