AxeldeRomblay / MLBox

MLBox is a powerful Automated Machine Learning python library.
https://mlbox.readthedocs.io/en/latest/
Other
1.49k stars 274 forks source link

Temporal leaks #119

Open ThomasBourgeois opened 3 years ago

ThomasBourgeois commented 3 years ago

Hi, very nice lib Axel,

I just tried it, is there any way to deal with temporal leaks (eg for regression).

Currently I'm unsure the lib deals with temporeal leaks, for example the current score that is spit out of Optimizer is a cross validation score but that probably is slightlty overestimated in case some folds are trained in the future compared to the validation set.

-> one way to deal with this is currently to do time-series cross validation with the following method: https://robjhyndman.com/hyndsight/tscv/

Sklearn has implemented it with the folowing : https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html

that could be a good PR for mlbox !

Best, Thomas

AxeldeRomblay commented 3 years ago

Hello @ThomasBourgeois, Thank you ! Indeed MLBox uses standard cv, but the problem with custom cv is that it is not automatic and/or increases the number of parameters in the Optimiser class... Nevertheless it should be easy to tweak the code here to add your own cv : https://github.com/AxeldeRomblay/MLBox/blob/master/mlbox/optimisation/optimiser.py#L424

See https://github.com/AxeldeRomblay/MLBox/issues/83 for more details ;) Axel

ThomasBourgeois commented 3 years ago

Hi @AxeldeRomblay , thx for your reply, I hadnt seen this was already referenced. Wow, tweaking the code with throw-away lines, that is a no-go for me, but if you think there is no other way, no pb. ;)