Holmeyoung / crnn-pytorch

Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.
MIT License
377 stars 105 forks source link

Training accuracy #10

Closed niddal-imam closed 5 years ago

niddal-imam commented 5 years ago

Hi Holmeyoung,

I have 20,000 training samples and 30 characters. I have been trying to train the model but the accuracy does not add up. How should I set the parameters?

Holmeyoung commented 5 years ago

Hi, did the training loss reduce? If yes, we can be patient, and the accuracy will be better after several epoch!

niddal-imam commented 5 years ago

Hi,

Ys, it reduces very slowly. I will give it more time.

Thanks

Holmeyoung commented 5 years ago

Hi, how is the training accuracy? If your training accuracy become up and down, you can turn down the lr to 0.0001, just like i have said in #2 .

niddal-imam commented 5 years ago

Hi,

Yes, exactly it keeps fluctuating. I will change lr, and see.

Thanks

niddal-imam commented 5 years ago

Hi Holmeyoung,

I was able to train the model after generating clean dataset. However, my question is when to stop the training? the accuracy is increasing and about to reach 100 %, which may case the model to overfitting. So, when to stop?

Thanks,

Holmeyoung commented 5 years ago

Hi, it's a good news. About your question, so, it's the meaning of val dataset. And the common strategy is no-improvement-in-n. It menas we should write down the best accuracy, if the performence of our model on val dataset did't increase after several epoch, it's time to stop. But, how to define several. When the accuracy up to stable, if the epoch number is 100 or 1000, so we should wait about 10 epoch or 20 epoch. But if after 10 epoch the model is trained well(in this case, the model usually up to 80 accuracy in 1 epoch.), wait about 3 to 5 epoch is OK. And one more thing, it's also the meaning of saving the model every setting-number interval! To avoid missing the best one. Hope this will help you.

niddal-imam commented 5 years ago

Thank you again for your corporation. Yes, the accuracy became stable after 30 epoch with 95%. However, when I test the model by using new samples, it could not recognise the words. When I use samples similar to the ones used for training, it recognises them.

Holmeyoung commented 5 years ago

Hi, since you want to predict on noisy images, you should't train on clean data. Try to make your training samples look like the image you actually want to predict on. If you just want to train on small dataset, you can add dropout to the net to avoid overfitting. in models/crnn.py

nn.LSTM(nIn, nHidden, bidirectional=True)

to

nn.LSTM(nIn, nHidden, bidirectional=True, dropout=0.5)
niddal-imam commented 5 years ago

Sure, I will try using the dropout.

Thanks

aaobscure commented 5 years ago

@Holmeyoung @niddal-imam

Hi guys,

I have a question,

I want to train on my dataset ( Farsi Language ), I cannot understand what the shape of the dataset should be?

In my problem, all of my images in the dataset have 5 sentences ( 3 numbers, 2 names) , can you tell me how to prepare for training?

I have to crop all of them, or I can train them in one image ( I mean all the 5 sentences in one image).

For example : Sentence 1 : niddal Sentece 2: 10 Sentence 3: Imam .....

Best,

niddal-imam commented 5 years ago

Hi,

From my experience, you have to crop all of your images. The dataset that I used contained sentences, but I cooped all the images to build my model. The shape of the dataset should be as Holmeyoung explains in the Readme file.

absolute/path/to/image/一身转战_0.jpg 一身转战 absolute/path/to/image/三千里_1.jpg 三千里 absolute/path/to/image/一剑曾当百万师_2.jpg 一剑曾当百万师 absolute/path/to/image/3.jpg 一剑曾当百万师 absolute/path/to/image/一 剑 曾 当 百 万 师_4.jpg 一 剑 曾 当 百 万 师 absolute/path/to/image/niddal.jpg niddal absolute/path/to/image/imam.jpg imam

aaobscure commented 5 years ago

@niddal-imam

Thanks for your response.

There is no problem for training, but for testing, I want to give the whole image ( I mean all 5 sentences ) , is this model able to detect all 5 sentences in the test phase?

niddal-imam commented 5 years ago

Yes, it can recognize sentences. However, words are going to be connected to each other. For example, Niddal Imam will be recognized as niddalimam.

aaobscure commented 5 years ago

@niddal-imam

So there is no way to correct this problem?

Also, I have another question:

In Farsi, such as Arabic, if two alphabets are connecting, the shape is changing such as "ح" with "حا", what should I do for this problem?

Again thanks for being kind.

niddal-imam commented 5 years ago

I have not found a solution for this problem yet. However, in my project, I used a text detection model that can detect words instead of sentences. After that, the recognition model can extract words correctly. Regarding the second question, you are right the model will recognize "ح" and "ا" as "حا" because the letters are connected. I do not know how to solve such problem because CTC separates characters by blanks.

aaobscure commented 5 years ago

@niddal-imam

I have trained the network, but the problem is that the model is not saved??

The path in the "params.py" is expr, so this folder appears in my folder, but the weights are not saved!!

Do you know how to correct this?

niddal-imam commented 5 years ago

Hi

You need to change these parameters: displayInterval = 100 valInterval = 1000 saveInterval = 1000

For more information, please refer to #3 .

SreenijaK commented 5 years ago

@aaobscure was your issue resolved? even I'm not able to save my model to expr folder even with these parameters displayInterval = 100 valInterval = 1000 saveInterval = 1000.

@niddal-imam can you help me here?

niddal-imam commented 5 years ago

You can either change these parameters or use a large training dataset. For example: displayInterval = 10 valInterval = 50 saveInterval = 50

If it does not work, lower these parameters.