Open baggiponte opened 1 year ago
some additional references:
sktime: provides several splitting strategies: temporal_train_test_split (only sample space split), SlidingWindowSplitter, SlidingWindowSplitter, ExpandingWindowSplitter, CutoffSplitter (window can be a datetime)
darts: only provides train_test_split
but supports specifying the splits both in time and in sample space via an axis
parameter.
edit: clarification from @FBruzzesi
The .backtest(...) method in darts offers flexible training options, including a custom callable for the retrain parameter to control model retraining based on timestamps. This allows for custom time windows, though customizing test sizes may be less straightforward.
Hey @elyase, thanks for adding the suggestion.
As mentioned on discord, I was dealing with this kind of issue myself and came up with a rough implementation in timebasedcv (some shameless promotion).
I wanted something that is scikit-learn compatible, hence supports (at least) numpy, pandas and polars. I am not claiming it is the perfect solution by any mean, but I believe it brings a lot of the features that you are looking for.
Regarding integration with functime, I should double check how to make interaction with the id/entity
column.
In the discord server, a while ago we mentioned the possibility of generating CV splits based on time intervals (e.g. the first day of the month or week). This might be useful for financial settings especially (e.g. retraining the model on mondays).
Plus, it might be useful to provide a
gap
parameter to allow for gaps between the train and test set (supported e.g. by skforecast):plot source