ClimbsRocks / auto_ml

[UNMAINTAINED] Automated machine learning for analytics & production
http://auto-ml.readthedocs.io
MIT License
1.64k stars 310 forks source link

create an oversampling module #278

Open ClimbsRocks opened 7 years ago

ClimbsRocks commented 7 years ago

things we want the user to be able to tweak:

the final ratio of classes (keep_original_class_ratio, or hard-coded percentages for each class, like [0.25, 0.75]) how many samples they want total the params for smote

for the minority class, it's pretty easy: take the full original dataset fit every kind of smote and adasync possible (i think there are 5 varieties of smote) keep taking samples from our original dataset until we're up to the level we need

For the majority class, it's a little trickier. we'll want to do this one after the minority class we can do one of two things:

  1. undersample the majority class so it becomes the minority, doing this differently for every oversampler we fit. then possibly use the whole thing for actually getting the new samples, after fitting?
  2. just add in all the minority synthetic samples from smote, so now we've got our true majority rows, our true minority rows, and a bunch of synthetic minority rows, so the majority rows actually make up less than 50% of this particular sample.