CrossValidation results suffering from oversampling/augmentation

Problem

In our current GridSearch approach we train the models on the oversampled/augmented train set. On the same set, we do perform the cross-validation. This is a problem as the model sees samples in the validation-split that it already saw in the train-split. Hence, models that overfit will be favored by the GridSearch.

Resources

https://imbalanced-learn.org/dev/miscellaneous.html#custom-samplers

dominikmn / one-million-posts

CrossValidation results suffering from oversampling/augmentation #97

Problem

Resources