clovaai / deep-text-recognition-benchmark

Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Apache License 2.0
3.72k stars 1.09k forks source link

Fine tuning the pretrained model #219

Open anavc94 opened 4 years ago

anavc94 commented 4 years ago

Hi,

I am using the CRAFT + this library pipeline to locate and do char level recognition using a pretrained model you provided (TPS-ResNet-BiLSTM-Attn-case-sensitive.pth). However, I am stuck in a point where the recognizer does not seem to give better results and I want to train my own model with my own data. My own data would consist in a set of image of extremely low resolution (about 20x40 or so) as they are character (numbers/letters) images, not words. Luckly, all the characters I want to recognize look similar, i.e. similar fonts and size. These are examples of images I would like to recognize:

5_train 3_train 4_train

as "3, U, 7".

The approach I think I am going to use is fine tuning the pretrained model TPS-ResNet-BiLSTM-Attn-case-sensitive.pth:

-> in this case, I wonder how many images per kind of character should I use? I really need this info. -> if I only train for some digits (for example, for digits "1" and "7" as they are often confused), can the rest of my letters/digit get worse? -> would it be good if a use my own data + a public char recognition dataset such as: http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/ ? I think it would be easier for me as I don't have so many data, -> maybe is it better to train from scratch with my own data + char74k?

Hope someone can guide me a bit. Btw, congratulations for both respositories, they are pretty good!

Ana

ku21fan commented 4 years ago

Hello :)

They are open questions, and probably they all are up to the dataset, so they are hard to answer...

If I were you, I would start an experiment using char74k. Training from scratch would work, I have trained with char74k before, and it worked well. But I am not sure that is better than fine-tuning from the pretrained model. (I have not tested/compared it)

Remember to change image size 32x100 to 32x32 or something for the character recognition.

I wish you good luck, Best.

anavc94 commented 4 years ago

Hi @ku21fan !

Thank you for your suggestion! In fact, I was preparing a train with the char74k dataset + ICDAR2003 + some images of my own dataset (about 100). Maybe I will add some more datasets in the future such as ICDAR2005 or MNIST. As you have not compared training from scratch vs. fine tuning the pretrained model, I will update with my results so that everybody can have a reference, but it is good to know that you have been able to train from scratch with char74k :)

Regards! Ana

ilyak93 commented 3 years ago

@anavc94, Hi. Did you tried to predict words with "character level detection and recognition", i.e to learn to predict words with only learning on the characters data-set ? Did it work ? I tried it with in my own language and a characters data-set, and it learned to identify words with accuracy about 80%, but It didn't learned to predict sequences of words. I wonder if you had any other results or interesting insights.

vallimangai commented 1 year ago

@ilyak93 ,Hai.. I am trying character-level detection.. If your code is working with 80% then if you share the code or logic it will be help for me. Thank you

zaklabs commented 1 year ago

i have the same thing. using an existing pretrained model (TPS-ResNet-BiLSTM-Attn.pth), some numeric characters cannot be recognized properly. i want to do FT (fine-tune) with my data. if there is any source code for this thing. Please help me. Thankyou