Open bmoore20 opened 3 years ago
Need to deal with the major in-balance in the size of the two classes (bga and non_algae).
Below are some good links that describe the importance of making sure the classes are balanced in size and some methods on how to best handle the problem:
https://www.researchgate.net/post/Machine-learning-if-proportion-of-number-of-cases-in-different-class-in-training-set-matters
https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/chawla2002.html
https://machinelearningmastery.com/combine-oversampling-and-undersampling-for-imbalanced-classification/
https://paperswithcode.com/method/smote
Way to implement with PyTorch....
https://discuss.pytorch.org/t/balanced-sampling-between-classes-with-torchvision-dataloader/2703/26
https://github.com/ufoym/imbalanced-dataset-sampler
Current TOTAL HABs Dataset Count:
BGA: 5 Non-Algae: 231 (156 clear, 75 turbid)
**Non-Algae 46.2 X greater than BGA! 231/5 = 46.2
Need to deal with the major in-balance in the size of the two classes (bga and non_algae).
Below are some good links that describe the importance of making sure the classes are balanced in size and some methods on how to best handle the problem:
https://www.researchgate.net/post/Machine-learning-if-proportion-of-number-of-cases-in-different-class-in-training-set-matters
https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/chawla2002.html
https://machinelearningmastery.com/combine-oversampling-and-undersampling-for-imbalanced-classification/
https://paperswithcode.com/method/smote