Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.98k stars 342 forks source link

Default windows_batch_size not big enough to include all windows in each batch #949

Closed Newaij0 closed 5 months ago

Newaij0 commented 6 months ago

What happened + What you expected to happen

1. Confusion in the description of windows_batch_size

The documentation of the model describes that windows_batch_size: int=1024, number of windows to sample in each training batch, default uses all. However, there is a confusion in this documentation, since window size in each batch may exceed 1024.

For example, in my case, I'm using a TimeSeries dataset with 300 groups and each group contains a 730-day series. If I understand the model correctly, I set batch size = 15, which means one batch will contain 15 groups and thus 20 steps will become one epoch. And each batch will contain batch_size windows num = 15(730-30) = 10500, since I set step_size=1, horizon =30. The calculation is roughly listed below:

2. Cannot use all windows if each group has different length of series

In the above case, windows_batch_size can be calculated easily and set fixedly. But in the case where each group contains different length of series, I can't set windows size for each batch manually. Is it possible to set the windows_batch_size automatically based on the parse_window result?

Versions / Dependencies

python 3.11 Neuralforecast version 1.6.4

Reproduction script

https://nixtlaverse.nixtla.io/neuralforecast/common.base_windows.html

Issue Severity

None

elephaint commented 5 months ago

Not sure what issue it is you are experiencing; do you get an error? Bad forecasting results? If so, please provide a standalone piece of code that we can use to reproduce the issue.

Generally, windows are created by unfolding each time series according to a window size (input_size+ horizon) and a step_size. However, as you correctly note, some time series in the dataset may not be available for all timesteps. After creating the windows neuralforecast selects only those samples which are available. Thus, the final number of samples that are being trained can be much smaller.

An example (following the numbers you provided):

Now,

Now, if windows_batch_size is not None, we will sample from windows, i.e. we sample from shape [5900, 90, C] with the size of windows_batch_size. So, the final windows shape will then be [windows_batch_size, 90, C]. In case windows_batch_size > n_windows, we sample with replacement, thus the same sample may occur in the windows multiple times. Otherwise, the random sample is a subselection of the available windows.

Thus, neuralforecast handles the inavailability of series within a batch (and dataset) internally already.

Does this explanation solve the confusion you had?

Newaij0 commented 5 months ago

Many thanks for the explanation.

I'm having bad forecasting results and initially located the problem to the inadequate of training data, because I use default window_batch_size=1024, which is only 1024/10065 = 10% of samples in each step.

According to the explanation, however, does it mean that I have to do additional calculation before training in order to set proper window_batch_size?

Moreover, If the group num (N) is not divisible by batch_size (B) (unllike the above example), batch_size of last batch will be less then batch_size, and after parsing into windows, the n_windows of last batch will be smaller then previous n_windows. Therefore if the window_batch_size is set between n_windows of last batch and n_windows, samples in the last batch will be sampled multiple times, while samples in the other batch will be excluded randomly during each epoch. In other word, the last batch seems to be weighted according to the sample mechanism.

elephaint commented 5 months ago

For most of our Auto* models, we use a default tuning space of [128, 256, 512, 1024] for windows_batch_size. That's appropriate for most cases. If resources allow it, feel free to experiment with a higher number - it often doesn't necessarily lead to better results.

I think your reasoning about the last batch is correct; to avoid this you could also simply drop it by setting drop_last_loader=True, although I don't think this will typically 'move the needle' much in terms of forecasting performance.

In general if you're having bad forecasting results, these parameters would be (very far) down on my list to start tuning. Usually what's more important is:

I'd be tuning / checking all these knobs first before turning to a less important parameter such as windows_batch_size.