functime-org / functime

Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
https://docs.functime.ai
Apache License 2.0
1k stars 52 forks source link

[FEAT] Frequency based cross-validation and gap parameter #89

Open baggiponte opened 10 months ago

baggiponte commented 10 months ago

In the discord server, a while ago we mentioned the possibility of generating CV splits based on time intervals (e.g. the first day of the month or week). This might be useful for financial settings especially (e.g. retraining the model on mondays).

Plus, it might be useful to provide a gap parameter to allow for gaps between the train and test set (supported e.g. by skforecast):

image

plot source

elyase commented 4 months ago

some additional references:

sktime: provides several splitting strategies: temporal_train_test_split (only sample space split), SlidingWindowSplitter, SlidingWindowSplitter, ExpandingWindowSplitter, CutoffSplitter (window can be a datetime)

darts: only provides train_test_split but supports specifying the splits both in time and in sample space via an axis parameter.

edit: clarification from @FBruzzesi

The .backtest(...) method in darts offers flexible training options, including a custom callable for the retrain parameter to control model retraining based on timestamps. This allows for custom time windows, though customizing test sizes may be less straightforward.

FBruzzesi commented 4 months ago

Hey @elyase, thanks for adding the suggestion.

As mentioned on discord, I was dealing with this kind of issue myself and came up with a rough implementation in timebasedcv (some shameless promotion).

I wanted something that is scikit-learn compatible, hence supports (at least) numpy, pandas and polars. I am not claiming it is the perfect solution by any mean, but I believe it brings a lot of the features that you are looking for.

Regarding integration with functime, I should double check how to make interaction with the id/entity column.