JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://www.jaided.ai
Apache License 2.0
23.59k stars 3.09k forks source link

Training does not create custom_model.pth #1177

Open kiprock opened 9 months ago

kiprock commented 9 months ago

Hello,

After training (1000 train images, 200 test images) I get a list of pth files in the saved_models dir. However, none of these are names after mycustommodel.pth. I am not sure which one to use? I would try best_accuracy.pth but it is the oldest file, generated at the start of training. Can anyone tell me which one to use? Thank you!!

I have carefully read the training documentation.

kip@ubi18:~/CODE/EasyOCR/trainer/saved_models/MyCustomModel$ ls -l total 470548 -rw-rw-r-- 1 kip kip 15054027 Dec 5 00:42 best_accuracy.pth -rw-rw-r-- 1 kip kip 15054027 Dec 5 00:42 best_norm_ED.pth ... -rw-rw-r-- 1 kip kip 15054027 Dec 5 02:06 iter_80000.pth -rw-rw-r-- 1 kip kip 15054027 Dec 5 02:21 iter_90000.pth -rw-rw-r-- 1 kip kip 834 Dec 5 00:14 log_dataset.txt -rw-rw-r-- 1 kip kip 9716 Dec 5 07:16 log_train.txt -rw-rw-r-- 1 kip kip 874 Dec 5 00:14 opt.txt

adwaitt-pandya commented 9 months ago

Were you able to solve this? I'm getting the same issue.

kiprock commented 9 months ago

As far as I can understand, after looking at the times on these files, there are a few iter_xxxx.pth created BEFORE best_accuracy.pth and best_norm_ED.pth. I don't fully understand everything, but I suggest you try using one of these 2 files (rename them to yourmodel.pth) the follow the directions at the bottom of https://github.com/JaidedAI/EasyOCR/blob/master/custom_model.md

The later iter_xxxx.pth files (the ones created after best_accuracy.pth and best_norm_ED.pth) can also be used, as they are later iterations of the model creation, but not guaranteed to work better. It's possible that models from later iterations, despite having gone through more training, could start overfitting to the training data, leading to reduced performance on unseen data. The models saved earlier (best_accuracy.pth and best_norm_ED.pth) might have struck a better balance between learning and generalizing.

It's easy to test the various pth files once you have everything set up. Hope that helps..

The documentation really needs to be more clear.

adwaitt-pandya commented 9 months ago

Thnks a lot Kip!