Holmeyoung / crnn-pytorch

Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.
MIT License
377 stars 105 forks source link

Fine tune for longer text > 26 #19

Closed mariembenslama closed 5 years ago

mariembenslama commented 5 years ago

Hello, long time no see :))

I would like to ask how to fine tune the model (change the crnn inputs) for a text with longer length > 26 characters or a text with two lines or more?

Holmeyoung commented 5 years ago

Hi, long~~~time no see.

  1. If you want to input text longer than 26 1.1 Change the image resize width. In such way, the output will be longer than 26. 1.2 Change the model shape. It's not a suggested way.

  2. Text with two lines or more. 2.1 I ever do a OCR with lines or more. My solution is projection seprate. Change it to single line and do OCR. Change the image to gray. The location with character will have deeper color. Sum the value of each line and you will find the location to separate.

Oh my stupid English expression... Hope you can understand. image

mariembenslama commented 5 years ago

Thank you very much for the answer :D And lol yappp long time no see and P.S: Your english is so good u've got no worries ;)

tumbleintoyourheart commented 5 years ago

Hi, long~~~time no see.

  1. If you want to input text longer than 26 1.1 Change the image resize width. In such way, the output will be longer than 26. 1.2 Change the model shape. It's not a suggested way.
  2. Text with two lines or more. 2.1 I ever do a OCR with lines or more. My solution is projection seprate. Change it to single line and do OCR. Change the image to gray. The location with character will have deeper color. Sum the value of each line and you will find the location to separate.

Oh my stupid English expression... Hope you can understand. image

Hi, what do you mean exactly by changing the image size width? Is it imgW = 100 in params.py?

mariembenslama commented 5 years ago

I guess: it's about changing the input image size that you will train it, so In the "generator.py" file, there's a code line:

image = img.resize(...) function commented. In darken_func function

You remove the comment and update it.

mariembenslama commented 5 years ago

Question: How to change the code so it's able to read two lines? (Where exactly)

P.S: The space between the lines is small. And it's exactly like how it is in your image (your comment).

Holmeyoung commented 5 years ago

Hi, the cnn and rnn operation is on the whole image, so if we want to read two line, we must split them before training, which means, treate it as two image. And when recognize the two line image we also need to split the image to one line first and then read it. As for the way to split the image, there is a model called CTPN, it's used to recognize the text location. Also you can use projection segmentation. Just like i have said above.

mariembenslama commented 5 years ago

Yes I know the CTPN module, however my solution for detecting the text is already set and done so the input (of the final system) is something that has a line or two.

Splitting the two lines is a 2nd solution to me so my 1st solution is that I was hoping for the RNN to be able to read two lines properly. I was hoping to know how to set the "projection segmentation" mode in the code? In what file in what place.

Holmeyoung commented 5 years ago

Hi, i have a great idea, but i don't know if it's ok. Just use crnn, this time your rnn layer has two line eg: it used to be [32, 1, 26, 256](batch_size, height, T length(width), num). But now, it's [32, 2, 26, 256](batch_size, height, T length(width), num). Then you can flatten it to be one line. It's [32, 1, 52, 256], then you can do the crnn like it used to be!!!

mariembenslama commented 5 years ago

I see, so you mean by that: we will be skipping (somehow like striding) using two rows instead of one row?

But if I'm correct wouldn't that affect (makes wrong guesses) to the recognition already?

==> and also if that's the case and our program already reads row by row, then by default, it actually can read two lines 😄 lol

==> So the solution is that we just enlarge the input image? (It's a suggestion)

Holmeyoung commented 5 years ago

Hi, sorry i did't check github this days. If you want two rows, two solution

  1. Before putting the data into train and val, split the image into one line using projection segmentation. And when recognize, also split into 1*2 line.

  2. Not suggestied, we used to have one line feature into lstm, but now we have two. But there is a problem is that we can't be sure it's just one line feature for one line character.

OK, my email is shu_liyang@163.com, send some of your data to me, and i will give you a solution.

mariembenslama commented 5 years ago

Alright, actually my data is a bit different. I have created different datasets that contain human people citiziens information:

=> All of them are datasets apart, which mean the val_set.txt is different each time, because I want to run the crnn, training and test processes on different datasets apart and want to recognize different patterns apart.

So should I just (for example) send you the one with only addresses dataset?

mariembenslama commented 5 years ago

To conclude the issue: Ocr only works or does recognition on one line so in order to do ocr on 2 lines we must do text detection (not recongition) which is a phase before of pre-processing so we use either:

Then after that we take the cropped lines, pass them through crnn, get the output text and order them accordingly to the lines order. => Most famous solutions like google vision api do the same thing. => Issue is closed.