georgedouzas / imbalanced-learn-extra

Implementation of novel oversampling algorithms.
https://georgedouzas.github.io/imbalanced-learn-extra/
MIT License
32 stars 16 forks source link

Extended gsmote with smotenc mechanism for categorical features #14

Closed joaopfonseca closed 4 years ago

joaopfonseca commented 4 years ago

This adaptation allows the user to pass categorical_features to which the SMOTE-NC procedure proposed in the original SMOTE paper was applied categorical features. It works in a similar fashion as imblearn.over_sampling.SMOTENC for categorical features. It is passing both existing tests and the ones originally used to test imblearn's SMOTENC.

BTW, the new tests include sparse matrix inputs, I adapted a few parts of the code to have it support sparse inputs (as referenced in issue #1 ). Honestly I'm not entirely sure how useful this is, very rarely have I used sparse matrix formats, but it's there anyway.