dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
264 stars 56 forks source link

Low Accuracy in Image Classification Problem #2163

Closed TheAwerx closed 2 years ago

TheAwerx commented 2 years ago

System Information (please complete the following information):

Describe the bug

===============================================Experiment Results=================================================

| Summary |

|ML Task: ImageClassification | |Dataset: | |Label : Label | |Total experiment time : 228,21 Secs | |Total number of models explored: 1 |

| Top 1 models explored |

| Trainer MicroAccuracy MacroAccuracy Duration #Iteration | |0 DNN + ResNet50 0,4610 0,4624 228,2 0 |

Expected behavior I expected about 100 epochs and accuracy over 0,8 (%80)

Screenshots image

LittleLittleCloud commented 2 years ago

The trainer gains around 80% accuracy in training set, but it drops to ~46% in validation set.

I personally don't think it's caused by low epoch number, as the accuracy has been ~46% since epoch 10, which indicates no improvement in the following epochs.

Can you share with us a snapshot of what your dataset looks like, the low accuracy can be caused by thousands of possible reasons, considering the high-accuracy in training set, I'm suspicious that the network maybe just "remember" pictures in training set and fail to generalize to the entire dataset.

TheAwerx commented 2 years ago

@LittleLittleCloud 150 classes and about 500 images for each class. Image's sizes are 299x299 report2 report1

LittleLittleCloud commented 2 years ago

@TheAwerx Is the difference obvious among 150 classes? By obvious I mean can easily being distinguished from human eyes. For example, in the following pictures, the first is adana kebap, the second is beyti kababi and their difference is not obvious (just kababi). image image

In that situation, DNN can still distinguish them by just remember them, but in an unseen dataset, just remembering pictures is not enough, it needs to rely on generalized rules, like color, edge and so on. That's why DNN can classify weather because sunny and rainy has different colors, but it's difficult for DNN to classify different kebaps.

Since I only see one class of your images so I can't guarantee what causes the low validation score, but it might be because the difference among classes is not obvious, and DNN is just remembering all seen pictures and fail to generalize the rule to classify pictures when validation.

In this case, you need to do more feature engineering and help DNN find generalize rule rather than just throwing everything into model builder. Or adding more images in your training set, or both.

TheAwerx commented 2 years ago

@LittleLittleCloud I am grateful for your advices. I will try to improve my dataset and try again later. Hopefully this time I can get better results.

LittleLittleCloud commented 2 years ago

Cool, since this issue is related to training performance rather than bugs in model builder, I'm going to close this issue. Feel free to re-open it if you still have questions after improving dataset && retraining.