Questions regarding context length

mariosantosprivate commented 4 years ago

Hello all,

After reading the documentation I'm still a bit confused regarding the context_length parameter, and I couldn't find much online. As far as I understand the context_length defines the number of time steps to look back to predict the next prediction_length time steps.

Let's say I have a time series spanning 3 weeks (freq = D), I use the first 2 weeks for training the 3rd for testing (last 7 days).

If I set prediction_length = context_length = 7, does that mean the model will only "see" values from the 2nd week and ignore the 1st, even though the training set spans 2 weeks?
If that's the case does that mean context_length should be equal to time_series_length - prediction_length, so that it sees data from the complete date-range (training-set-start-date to training-set-end-date)?

kaijennissen commented 4 years ago

Hello all,

After reading the documentation I'm still a bit confused regarding the context_length parameter, and I couldn't find much online. As far as I understand the context_length defines the number of time steps to look back to predict the next prediction_length time steps.

Let's say I have a time series spanning 3 weeks (freq = D), I use the first 2 weeks for training the 3rd for testing (last 7 days).
* If I set prediction_length = context_length = 7, does that mean the model will only "see" values from the 2nd week and ignore  the 1st, even though the training set spans 2 weeks?

As far as I understand the code as well as the tutorials, during training, sequences of length equal to context_length + prediction_length are sampled from the training dataset. The first set of observations is used as input for the estimator and the second set to calculate the loss. In your case the first week is used as input for the estimator (or "seen" in your terms) and the second week to calculate the training loss.

* If that's the case does that mean context_length should be equal to time_series_length - prediction_length, so that it sees data from the complete date-range (training-set-start-date to training-set-end-date)?

The optimal case would be that you have a training set that allows sampling of multiple sequences of length equal to context_length+prediciton_length. F.e. if you have 28 days of observations (4 weeks), context_length=14 days and prediction length=7 days. During training multiple sequences of length=21 are sampled, the first two weeks are used as input and the last week to calculate the training loss.

mariosantosprivate commented 4 years ago

@kaijennissen I think I might have misunderstood when you said:

The optimal case would be that you have a training set that allows sampling of multiple sequences of length equal to context_length+prediciton_length.

F.e. if I have 84 days (12 weeks), and prediction_length = 7, then context_length = 14 is still optimal? Since the training set still allows sampling of multiple sequences of length 21 (in this case 4 non over-lapping sequences of length 21) And does that mean that in the same example (84 days) using context length = 35, (giving us ctx_len + pred_len = 42) is also optimal? Again it the training set allows sampling for multiple sequences of length 42 (in this case 2 non over-lapping sequences of length 21)

So I'm still confused, does this mean there is no optimal value but, in fact several?

kaijennissen commented 4 years ago

@mariosantosprivate What I mean is that with a longer history, it is possible to sample different subsequences of the time series and not only one (which was the case in your initial example). It is hard to say what the optimal value for context_length is since this heavily depends on the characterstics of yout time series (i.e. seasonality) F.e. with daily data you have a weekly seasonality but also a yearly (and sometimes a monthly).

mariosantosprivate commented 4 years ago

@kaijennissen Ah, now I understand, thank you!

awslabs / gluonts

Questions regarding context length #1111