YyzHarry / imbalanced-regression

[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression
http://dir.csail.mit.edu
MIT License
806 stars 128 forks source link

5-fold cross-validation of continuous target variable that has a highly-skewed distribution compared to normal distribution #12

Closed monajalal closed 2 years ago

monajalal commented 2 years ago

I was wondering if you have suggestion on how to do something like stratified K-fold cross-validation in sklearn for classification task for categorical data but here for regression task and for continuous target variable?

Unfortunately, sklearn doesn't have such an option.

YyzHarry commented 2 years ago

Hi - For StratifiedKFold module in sklearn, the document indicates that: The folds are made by preserving the percentage of samples for each class. So to replicate it in regression problems, a simple way is to divide the target range into discrete bins and calculate the #samples in each bin (the resolution might be based on each specific problem). Then you do the same thing by viewing bins as classes for fold division.

Since the question is not quite related to the repo, I'm closing it for now. Feel free to comment if you have any related questions.