In our current GridSearch approach we train the models on the oversampled/augmented train set.
On the same set, we do perform the cross-validation.
This is a problem as the model sees samples in the validation-split that it already saw in the train-split.
Hence, models that overfit will be favored by the GridSearch.
Problem
In our current GridSearch approach we train the models on the oversampled/augmented train set. On the same set, we do perform the cross-validation. This is a problem as the model sees samples in the validation-split that it already saw in the train-split. Hence, models that overfit will be favored by the GridSearch.
Resources
https://imbalanced-learn.org/dev/miscellaneous.html#custom-samplers