dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9k stars 1.88k forks source link

Rolling Cross-validation for Time-series #1026

Open justinormont opened 5 years ago

justinormont commented 5 years ago

To properly handle time-series (and time-dependent data in general), we should implement a Rolling Cross-validation to add to our existing CV & TrainTest modes.

We are currently merging various time-series functionality from the internal repo to this repo via https://github.com/dotnet/machinelearning/pull/977 "Port Time Series". This PR does not include a rolling cross-validation, used heavily in time-series tasks.

Rolling CV is better for time dependent datasets by always testing on data which is newer than the training data. Standard CV leaks future data in to the training set. Other names of Rolling CV include { walk-forward / roll-forward / rolling origin / window } CV.

Background on method: http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html https://otexts.org/fpp2/accuracy.html#time-series-cross-validation https://stats.stackexchange.com/questions/14099/using-k-fold-cross-validation-for-time-series-model-selection https://robjhyndman.com/hyndsight/tscv/ https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting/discussion/46602 https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9

To further investigate missing time-series components, the Azure ML Forecasting Toolkit is a good package listing components needed for this task:

codemzs commented 5 years ago

Thanks, @justinormont ! I too feel we should consider rolling CV as part of time series effort.