bgshih / crnn

Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.
MIT License
2.05k stars 548 forks source link

Work on line level instead of word level #67

Closed githubharald closed 5 years ago

githubharald commented 7 years ago

Hello,

I'm wondering if it is possible to work on complete lines instead of just on a word level. It can be hard to segment words in a line as a preprocessing step (depending on how "nice" the handwritten text looks). So if adding a new class, which I call "real blank" (not the CTC pseudo blank), then it should in theory be possible. Has anyone already tried this? In my experience so far, CTC works best with short sequences, but a line can be pretty long. Or is there some other way to avoid the preprocessing step of word segmentation?

How did you do this with the data from the ICFHR2016 Competition (*) in which CRNN was used?

(*) https://scriptnet.iit.demokritos.gr/competitions/4/

githubharald commented 6 years ago

just for those facing the same problem: it is really the best to include a new label for whitespace and to feed a complete text line into the network. The word segmentation is then learned by the network itself and works much better then any of the preprocessing (word segmentation) methods I've tried.

dajiangxiaoyan commented 6 years ago

@githubharald Do you try any other methods?

githubharald commented 6 years ago

what do you mean?

SnailTyan commented 6 years ago

@githubharald I want to train English model. Your "real blank" means "whitespace"? For example: Train mage: "I am a boy." Label: "I am a boy."(Unicode)

githubharald commented 6 years ago

there are two possibilities: a. input complete text lines and let NN learn whitespace label b. segment lines into words and let NN recognize those (single) words

It worked best for me to include the new whitespace label. I was then able to input complete text lines (as in your example). Of course, the input size of the NN has to be large enough. In my TensorFlow implementation of HTR, I used 128x32px for (single) word recognition and 800x64px for line recognition.

I also tried segmenting the lines into words. But this didn't quite work for my task (historical datasets). If the text you want to recognize is nicely written and word boundaries are much larger than inter-word-gaps, then segmentation may work. An easy method is described in [1] which you can try for your task.

[1] http://ciir.cs.umass.edu/pubfiles/mm-27.pdf

SnailTyan commented 6 years ago

@githubharald
Thanks a million. The input of your crnn model is 800x64px ? Do you modify the crnn network or resize images? From my point of view, I think the height of images should be 32px.

dajiangxiaoyan commented 6 years ago

@githubharald

  1. How do you generate training dataset? If the input image is not exactly 800*64, do you do any preprocessing?
  2. Also do you mind show the whole Network?
githubharald commented 6 years ago
  1. I create a target image of size 800x64. Then, I take the image from the dataset with width w and height h. The downscaling factor is then the larger one of w/800 and h/64. This means, the downsized image is not distorted, however, there will be some "empty" space left when copying it into the target image if it does not have the ratio of the target image. I calculate the background colour of the dataset image and fill the empty space with this colour.
  2. no, sorry, can't make code public.
dajiangxiaoyan commented 6 years ago

@githubharald Thanks very much. Your idea is very helpful for me. I have trained the model following you steps. And the model do well on simple background images, especially documents.

But works bad on other images, such as 'dogoodfor', the whitespace can not recognize. The words recognize very well. image

I wander if the training data is not proper. I synthesis text with https://github.com/ankush-me/SynthText. Firstly, synthesis English single words 2,000,000 Secondly, sythesis English short sentences 2,000,000. The short sentences come for three English novels, and every short sentence has most three words and two spaces. The number of distinct show sentences is 50,000.

Do you any idea why some whitespace can not recognize?

githubharald commented 6 years ago

if you're using the original CRNN, then there is no whitespace label included (therefore it can't be recognized). Did you add the whitespace label already and map it to the corresponding class?

dajiangxiaoyan commented 6 years ago

I use the original CRNN,with 400*32 input image. I generate training data with real whitespace,but replace the whitespace with another char in the label file, then training, after testing,will replace the char back to whitespace. Also how many words and whitespaces do you have in one text line training image?

dajiangxiaoyan commented 6 years ago

@githubharald

githubharald commented 6 years ago

just a rough guess: 6 words/line, that means 5 whitespace labels. And around 10000 lines in the training set.

dajiangxiaoyan commented 6 years ago

@githubharald Thanks very much. How much your precision about whitespace of lines? In my experiments, the validation precision is only 25% about whitespace.

githubharald commented 6 years ago

I get more than 90% (depending on dataset, parameters, ...). If you only get 25%, then there is something wrong. You replace a whitespace label by some other label. That's ok if you don't have this label in the dataset. E.g. if you use 'a' instead of ' ', then there must not be any word in the dataset containing 'a'. Is this true for your experiments?

dajiangxiaoyan commented 6 years ago

@githubharald
Yes. I doubt that I synthesis training text images in a wrong way. I will try some other training images. Again, Thanks very very much.

skt7 commented 5 years ago

I calculate the background colour of the dataset image and fill the empty space with this colour.

How do you do this, can you explain? @githubharald

jewelcai commented 5 years ago

@githubharald Thanks very much. Your idea is very helpful for me. I have trained the model following you steps. And the model do well on simple background images, especially documents.

But works bad on other images, such as 'dogoodfor', the whitespace can not recognize. The words recognize very well. image

I wander if the training data is not proper. I synthesis text with https://github.com/ankush-me/SynthText. Firstly, synthesis English single words 2,000,000 Secondly, sythesis English short sentences 2,000,000. The short sentences come for three English novels, and every short sentence has most three words and two spaces. The number of distinct show sentences is 50,000.

Do you any idea why some whitespace can not recognize?

I met with the same issue as you did. Did you resolve this issue?

wyichew2708 commented 4 years ago

The pretrained CRNN is only trained without white space, so kindly fine tune it with white space samples

ShadmanRohan commented 3 years ago

I use the original CRNN,with 400*32 input image. I generate training data with real whitespace,but replace the whitespace with another char in the label file, then training, after testing,will replace the char back to whitespace. Also how many words and whitespaces do you have in one text line training image?

Please share your experience... how many words per line gave good results?