courao / ocr.pytorch

A pure pytorch implemented ocr project including text detection and recognition
MIT License
580 stars 133 forks source link

Why my accuracy is always 0? #34

Open vasu03042000 opened 4 years ago

vasu03042000 commented 4 years ago

Hi author, i am a beginner and was training the model on my own dataset using your implementation but got the accuracy 0% after every epoch while the training loss kept on decreasing after every epoch. Why is it so?

courao commented 4 years ago

Hi, there may be many reasons causing such results, like too-few training samples, wrong mapping from original predicting vectors to final text results, or unsuitable alphabet(check if all chars are included in the alphabet), you can check the output predicting results and compare these with the ground-truth, if they are similar, you can training more time, or add more training data.

vasu03042000 commented 4 years ago

Actually its not predicting anything after every epoch. And by the way what should be the size of the dataset? i am training it on 1200 images

courao commented 4 years ago

Actually its not predicting anything after every epoch. And by the way what should be the size of the dataset? i am training it on 1200 images

1200 seems not enough, 10000+ or more will be better, you can add some synthetic data to overcome under-fitting.

vasu03042000 commented 4 years ago

Can you please provide me with your dataset. Actually this will help me a lot to go through the dataset. Mine email id is vasugupta2000@gmail.com

vasu03042000 commented 4 years ago

I won't be commercializing it, just needed it for personal work

courao commented 4 years ago

Can you please provide me with your dataset. Actually this will help me a lot to go through the dataset. Mine email id is vasugupta2000@gmail.com

Hi, actually most of my training are synthetic data, you can combine part of your real data with synthetic data as total training data, and another part real data for test, a lot of repos on GitHub are able to generate large-scale synthetic data, such as .