Default windows_batch_size not big enough to include all windows in each batch

What happened + What you expected to happen

1. Confusion in the description of windows_batch_size

The documentation of the model describes that windows_batch_size: int=1024, number of windows to sample in each training batch, default uses all. However, there is a confusion in this documentation, since window size in each batch may exceed 1024.

For example, in my case, I'm using a TimeSeries dataset with 300 groups and each group contains a 730-day series. If I understand the model correctly, I set batch size = 15, which means one batch will contain 15 groups and thus 20 steps will become one epoch. And each batch will contain batch_size windows num = 15(730-30) = 10500, since I set step_size=1, horizon =30. The calculation is roughly listed below:

batch size = 15
step per epoch = group num/batch_size = 300/15 = 20
step_size=1
horizon=30
window per batch = batch size*[(series length - horizon)/step_size] = 10500

2. Cannot use all windows if each group has different length of series

In the above case, windows_batch_size can be calculated easily and set fixedly. But in the case where each group contains different length of series, I can't set windows size for each batch manually. Is it possible to set the windows_batch_size automatically based on the parse_window result?

Versions / Dependencies

python 3.11 Neuralforecast version 1.6.4

Reproduction script

https://nixtlaverse.nixtla.io/neuralforecast/common.base_windows.html

Issue Severity

None

Not sure what issue it is you are experiencing; do you get an error? Bad forecasting results? If so, please provide a standalone piece of code that we can use to reproduce the issue.

Generally, windows are created by unfolding each time series according to a window size (input_size+ horizon) and a step_size. However, as you correctly note, some time series in the dataset may not be available for all timesteps. After creating the windows neuralforecast selects only those samples which are available. Thus, the final number of samples that are being trained can be much smaller.

An example (following the numbers you provided):

number of unique time series (N) = 300
minimum length of time series = 10
maximum length of time series (max_length) = 730
batch_size (B) = 15
step_size = 1
horizon (h) = 30
input_size (L) = 60

Now,

The total length in a batch that neuralforecast (BaseWindows) considers to create windows is 760: max_length + horizon. Batch shape: [15, C, 760], with C the number of features.
We can create 671 windows of length 90 (L + h) when using step_size=1. Shape = [15, C, 671, 90]
This means we have 671 * 15 = 10.065 samples. Shape = [10.065, 90, C]
Because not every time series is available for all samples, we only select those samples out of this batch that are available. So, we may end up with windows shape of, e.g. [5900, 90, C]. This means we have n_windows=5900 windows in this batch.

Now, if windows_batch_size is not None, we will sample from windows, i.e. we sample from shape [5900, 90, C] with the size of windows_batch_size. So, the final windows shape will then be [windows_batch_size, 90, C]. In case windows_batch_size > n_windows, we sample with replacement, thus the same sample may occur in the windows multiple times. Otherwise, the random sample is a subselection of the available windows.

Thus, neuralforecast handles the inavailability of series within a batch (and dataset) internally already.

Does this explanation solve the confusion you had?

Many thanks for the explanation.

I'm having bad forecasting results and initially located the problem to the inadequate of training data, because I use default window_batch_size=1024, which is only 1024/10065 = 10% of samples in each step.

According to the explanation, however, does it mean that I have to do additional calculation before training in order to set proper window_batch_size?

Moreover, If the group num (N) is not divisible by batch_size (B) (unllike the above example), batch_size of last batch will be less then batch_size, and after parsing into windows, the n_windows of last batch will be smaller then previous n_windows. Therefore if the window_batch_size is set between n_windows of last batch and n_windows, samples in the last batch will be sampled multiple times, while samples in the other batch will be excluded randomly during each epoch. In other word, the last batch seems to be weighted according to the sample mechanism.

For most of our Auto* models, we use a default tuning space of [128, 256, 512, 1024] for windows_batch_size. That's appropriate for most cases. If resources allow it, feel free to experiment with a higher number - it often doesn't necessarily lead to better results.

I think your reasoning about the last batch is correct; to avoid this you could also simply drop it by setting drop_last_loader=True, although I don't think this will typically 'move the needle' much in terms of forecasting performance.

In general if you're having bad forecasting results, these parameters would be (very far) down on my list to start tuning. Usually what's more important is:

Which features do you use?
What scaler type?
What model?
What model size? (i.e. hidden size, or some other parameter that defines the key size of the neural network)
Other important model parameters, such as dropout rates or normalization types
Number of steps trained
Learning rate
Loss function used

I'd be tuning / checking all these knobs first before turning to a less important parameter such as windows_batch_size.

Nixtla / neuralforecast