awslabs / gluonts

Probabilistic time series modeling in Python
https://ts.gluon.ai
Apache License 2.0
4.41k stars 740 forks source link

Why should the validation or testing dataset use the whole data? #3199

Open arieffadhlan opened 1 week ago

arieffadhlan commented 1 week ago

Description

I'm still new to using DeepAR model. I see AWS Sagemaker makes the following statement and it looks like GluonTS implements the same. "You can create training and test datasets that satisfy this criteria by using the entire dataset (the full length of all time series that are available) as a test set and removing the last prediction_length points from each time series for training."

Why should the testing dataset use the whole data?

For example, I have 1000 data and want to predict the next 30 days. The training data will be used from 1 - 970. Why does the testing data use 1 - 1000?