bmoore20 / habs

Detect Harmful Algal Blooms (HABs) in images of the Finger Lakes.
0 stars 0 forks source link

Imbalanced Classification #64

Open bmoore20 opened 3 years ago

bmoore20 commented 3 years ago

Need to deal with the major in-balance in the size of the two classes (bga and non_algae).

Below are some good links that describe the importance of making sure the classes are balanced in size and some methods on how to best handle the problem:

https://www.researchgate.net/post/Machine-learning-if-proportion-of-number-of-cases-in-different-class-in-training-set-matters

https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/chawla2002.html

https://machinelearningmastery.com/combine-oversampling-and-undersampling-for-imbalanced-classification/

https://paperswithcode.com/method/smote

bmoore20 commented 3 years ago

Way to implement with PyTorch....

https://discuss.pytorch.org/t/balanced-sampling-between-classes-with-torchvision-dataloader/2703/26

https://github.com/ufoym/imbalanced-dataset-sampler

bmoore20 commented 3 years ago

Current TOTAL HABs Dataset Count:

BGA: 5 Non-Algae: 231 (156 clear, 75 turbid)

**Non-Algae 46.2 X greater than BGA! 231/5 = 46.2