LSSTDESC / snmachine

Machine learning code for photometric supernova classification
BSD 3-Clause "New" or "Revised" License
9 stars 5 forks source link

Data augmentation via Nearest Neighbour algorithms #246

Open Catarina-Alves opened 3 years ago

Catarina-Alves commented 3 years ago

It could be nice to include a class that encapsulates data augmentation via Nearest Neighbour-inspired algorithms such as SMOTE (Synthetic Minority Over-sampling Technique), ADASYN etc. @tallamjr developed some code for this, and it is saved in utils/imblearn_augment.py.

I propose to implement this data augmentation methodology in snaugment. This involves testing and developing unit tests. Note that, in previous analysis, we found that SMOTE augmentation leads to information leaks in the classification step. Thus this must be checked when implementing this augmentation.

File: snaugment.py, utils/imblearn_augment.py

Catarina-Alves commented 3 years ago

While we do not find this code to work for our imbalanced problem, it might be useful for someone else.