MetOffice / XBTs_classification

Project for the classification of eXpendable Bathy Thermographs
BSD 3-Clause "New" or "Revised" License
4 stars 2 forks source link

Investigate bootstrapping / oversampling #109

Closed stevehadd closed 3 years ago

stevehadd commented 3 years ago

Oversampling / bootstrapping tehcniques are an alternative way to deal with the class imbalance problem. They are also a way to quantify uncertainty in our calculation. The existing ensemble technique is one way of doing this. Through random sampling, we could do more ensemble members.

https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced-classification/ https://elitedatascience.com/imbalanced-classes https://www.analyticsvidhya.com/blog/2020/10/improve-class-imbalance-class-weights/ https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-learning/

https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/ https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced-classification/ https://imbalanced-learn.org/stable/over_sampling.html#a-practical-guide

stevehadd commented 3 years ago

Yet more info on bootstrapping: https://machinelearningmastery.com/a-gentle-introduction-to-the-bootstrap-method/

stevehadd commented 3 years ago

This has been implemeted in various notebooks and updated for the batch code in PR #111.