What happens when giving more than context_length time steps to Predictor ?

StanislasGuinel commented 3 years ago

I trained a deepAR estimator using a specific context_length and prediction_length. Then I want to predict prediction_length future values. I don't understand why I get different predictions depending on how many time steps I give to the Predictor. I thought the estimator would only look at the last context_length time steps to make predictions, but when I give more than context_length time steps, I get different results (they seem to be worse). How many time steps should I give the Predictor ? Is context_length the best practice ?

dai-ichiro commented 3 years ago

Please see this site. https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html#deepar_best_practices

Except for when splitting your dataset for training and testing, always provide the entire time series for training, testing, and when calling the model for inference. Regardless of how you set context_length, don't break up the time series or provide only a part of it. The model uses data points further back than the value set in context_length for the lagged values feature.

StanislasGuinel commented 3 years ago

Thanks for your answer ! I am also wondering if the lagged values feature that makes the model look further back than the context_length exist in other types of model such as simplefeedforward, deepfactor, lstnet or npts ?

dai-ichiro commented 3 years ago

Sorry, I don't know any more.

StatMixedML commented 3 years ago

I am also wondering if the lagged values feature that makes the model look further back than the context_length

Yes that's right. Citing from here https://github.com/aws/amazon-sagemaker-examples/issues/390

context_length should not be confused with history length. DeepAR model internally takes the target values that occur before the context_length time points as features (known as lags in auto-regressive models). So you should always provide much more than context_length data points since context length is typically small (usually equal to prediction length).

Also, from https://github.com/awslabs/gluon-ts/blob/master/src/gluonts/model/deepar/_estimator.py#L230 you see

self.history_length = self.context_length + max(self.lags_seq)

Basically, the context_length corresponds to the encoder length and the prediction_length to the decoder length.

At each time step t, the inputs to the network are the covariates x{i ,t}, the target value at the previous time step z{i, t−1} , and the previous network output h{i}. So the lagged values are part of the covariates x{i ,t}.

awslabs / gluonts

What happens when giving more than context_length time steps to Predictor ? #1067