In the optimization pipeline I can define the number of folds. I assume cross-validation on these folds is used to optimize the pipeline?
n_folds (int, default = 2) – The number of folds for cross validation (stratified for classification)
Especially for regression problem with time-dependent data it would make sense not to pick the observations randomly from the whole dataset, but to use slices of connected datapoints for each fold. An option to manually define the slices would also be nice. Otherwise there is a high risk of overfitting.
Is such a feature planned or maybe even possible to configure already?
In the optimization pipeline I can define the number of folds. I assume cross-validation on these folds is used to optimize the pipeline?
n_folds (int, default = 2) – The number of folds for cross validation (stratified for classification)
Especially for regression problem with time-dependent data it would make sense not to pick the observations randomly from the whole dataset, but to use slices of connected datapoints for each fold. An option to manually define the slices would also be nice. Otherwise there is a high risk of overfitting.
Is such a feature planned or maybe even possible to configure already?