Support for data balancing

felixbur / nkululeko

Machine learning speaker characteristics

MIT License

26 stars 4 forks source link

Support for data balancing #88

Closed bagustris closed 7 months ago

bagustris commented 8 months ago

Data balancing is important for machine learning.

I would to propose the following feature:

[DATA]
balancing = True # default false
balancing_strategy = ros # options: ros, smote, adasyn

There is currently imbalanced-learn package that is contrib package of scikit-learn. However, no need to stick in this package (can be defined in balancing_strategy above).

felixbur commented 8 months ago

yes, makes sense, i used to work with the imbalance learn package some years ago i wanted to wait until my pal Uwe releases his stratification package, but that might take too long, so we could go ahead with imbalanced-learn

felixbur commented 8 months ago

I guess that would be a special filter to be set in experiment class as a post-processing step to feature extraction

felixbur commented 8 months ago

for shortness I'd prefer

[DATA]
balancing = ros # options: ros, smote, adasyn

felixbur commented 7 months ago

Done with version 0.70.0 i changed it to [FEATS] balancing = ros # options: ros, smote, adasyn

because it's really the feature sets that are varied