Closed TaherHabib closed 3 years ago
Hey there. The reason we can't use K-fold cross validation is that it would introduce data leakage.
For time series, all validation folds must be in the future. If you consider your time series observations as being ordered training data, folds that omit a portion in the middle of the sequence actually give your model information it should not have apriori (namely the value of future observations). By using the approaches to CV we've implemented in the package, you can prevent data leakage:
Does that answer your question?
Thanks for the response!
I agree with your reasoning above for not using K-fold CV for time series data in order to preserve the temporal order in the training and testing procedures. However, my request for K-fold was motivated by the fact that ARIMA models are applied to stationary stochastic processes, where the statistical distribution of the stochastic process has an unchanging (over time) mean, variance, etc. As a result, it makes sense to me to apply K-fold CV – at least to the case of strongly stationary time series data. But, I am not so sure about this.
Please let me know if there's anything missing here :)
Hello,
Great work with pmdarima! Thanks :)
However, I wonder why K-fold cross validation scheme is not provided in the package?
Since ARIMA is an autoregressive model requiring the data to be stationary, I don't understand why K-fold cross validation could not be used in this case.