AxeldeRomblay / MLBox

MLBox is a powerful Automated Machine Learning python library.
https://mlbox.readthedocs.io/en/latest/
Other
1.49k stars 274 forks source link

Shuffling of n_folds #108

Closed hanshupe closed 4 years ago

hanshupe commented 4 years ago

In the optimization pipeline I can define the number of folds. I assume cross-validation on these folds is used to optimize the pipeline?

n_folds (int, default = 2) – The number of folds for cross validation (stratified for classification)

Especially for regression problem with time-dependent data it would make sense not to pick the observations randomly from the whole dataset, but to use slices of connected datapoints for each fold. An option to manually define the slices would also be nice. Otherwise there is a high risk of overfitting.

Is such a feature planned or maybe even possible to configure already?

AxeldeRomblay commented 4 years ago

Hi @hanshupe, Good point ! Here is the answer, I let you read : https://github.com/AxeldeRomblay/MLBox/issues/83 Feel free to ask if something is not clear enough.