JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://www.jaided.ai
Apache License 2.0
23.5k stars 3.08k forks source link

help for bad result on test data #941

Open mdanesh90 opened 1 year ago

mdanesh90 commented 1 year ago

Hi I train this model for my data in persian and english but i have bad result after i use my weight with 90% accuracy in training on reader function and inference model. my result is like repeating some words not even close to right label. I try this on my train data also this can't recognize the label. I used easyocr before I train the model and for that picture used in test and train and it could recognize but after I training on my data it can't. So can anyone have idea of what is my wrong?

EivindKjosbakken commented 7 months ago

I would guess your dataset is not diverse enough, meaning the model gets overfitted to the custom data you are feeding it. In order to deal with this you need a large diverse dataset, which you can for example do with synthetic generation. I made an article in TowardsAI about that here if you are interested: https://pub.towardsai.net/how-to-make-a-synthesized-dataset-to-fine-tune-your-ocr-3573f1a7e08b. Also, I would test training less with your data (for example by lowering learning rate, or running fine-tuning for fewer epochs/iterations), as this can help prevent overfitting.