Holmeyoung / crnn-pytorch

Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.
MIT License
377 stars 105 forks source link

Methodology for training #12

Closed mariembenslama closed 5 years ago

mariembenslama commented 5 years ago

Hello, Sorry for asking so many questions 😅 I was thinking of a way so that the program can grasp all the characters.

What if we start by teaching it for example a small amount of characters and then add the others little by little?

For example we give it a dataset of 50 characters only, we see how it performs, and then next time we add a dataset of new ones and see if it's able to differentiate the features.

Do you think an approach like this is fine? Or it's better to give it everything from the begining?

niddal-imam commented 5 years ago

Hi,

I have read an article mentioned an algorithm for training text recognition model called curriculum learning. I think it matches your idea. This is the title of the paper: Rosetta: Large Scale System for Text Detection and Recognition in Images.

I hope it helps.

mariembenslama commented 5 years ago

Thanks for the answer, I just checked out the articles and it seems interesting! But I wonder if it will work in our case. Feeding the algorithm little by little will that really bring high accuracy?

But I guess going with Holmeyoung's idea about large dataset = better answer in this case is probably the right thing.

But due to my machine's limits and since I'm using google colab (12 hours free gpu use only) as a tool for DL is giving me a disadvantage here when creating a large dataset 😌 That's why I thought about this idea.

Anyways, thanks a lot for the answer 😄 Also do you recommend any DL tools/environnement?

niddal-imam commented 5 years ago

No problems. Actually, I am using paperspace, but it is not free unfortunately 😌.

mariembenslama commented 5 years ago

Hehe I see,it's troubling right 😅

Well, it's worth it when your program work 😊

Thanks for the suggestion and the answer :)

niddal-imam commented 5 years ago

Yes, it is worth it.

Goodluck.