Closed githubharald closed 5 years ago
just for those facing the same problem: it is really the best to include a new label for whitespace and to feed a complete text line into the network. The word segmentation is then learned by the network itself and works much better then any of the preprocessing (word segmentation) methods I've tried.
@githubharald Do you try any other methods?
what do you mean?
@githubharald I want to train English model. Your "real blank" means "whitespace"? For example: Train mage: "I am a boy." Label: "I am a boy."(Unicode)
there are two possibilities: a. input complete text lines and let NN learn whitespace label b. segment lines into words and let NN recognize those (single) words
It worked best for me to include the new whitespace label. I was then able to input complete text lines (as in your example). Of course, the input size of the NN has to be large enough. In my TensorFlow implementation of HTR, I used 128x32px for (single) word recognition and 800x64px for line recognition.
I also tried segmenting the lines into words. But this didn't quite work for my task (historical datasets). If the text you want to recognize is nicely written and word boundaries are much larger than inter-word-gaps, then segmentation may work. An easy method is described in [1] which you can try for your task.
@githubharald
Thanks a million.
The input of your crnn model is 800x64px ? Do you modify the crnn network or resize images? From my point of view, I think the height of images should be 32px.
@githubharald
@githubharald Thanks very much. Your idea is very helpful for me. I have trained the model following you steps. And the model do well on simple background images, especially documents.
But works bad on other images, such as 'dogoodfor', the whitespace can not recognize. The words recognize very well.
I wander if the training data is not proper. I synthesis text with https://github.com/ankush-me/SynthText. Firstly, synthesis English single words 2,000,000 Secondly, sythesis English short sentences 2,000,000. The short sentences come for three English novels, and every short sentence has most three words and two spaces. The number of distinct show sentences is 50,000.
Do you any idea why some whitespace can not recognize?
if you're using the original CRNN, then there is no whitespace label included (therefore it can't be recognized). Did you add the whitespace label already and map it to the corresponding class?
I use the original CRNN,with 400*32 input image. I generate training data with real whitespace,but replace the whitespace with another char in the label file, then training, after testing,will replace the char back to whitespace. Also how many words and whitespaces do you have in one text line training image?
@githubharald
just a rough guess: 6 words/line, that means 5 whitespace labels. And around 10000 lines in the training set.
@githubharald Thanks very much. How much your precision about whitespace of lines? In my experiments, the validation precision is only 25% about whitespace.
I get more than 90% (depending on dataset, parameters, ...). If you only get 25%, then there is something wrong. You replace a whitespace label by some other label. That's ok if you don't have this label in the dataset. E.g. if you use 'a' instead of ' ', then there must not be any word in the dataset containing 'a'. Is this true for your experiments?
@githubharald
Yes. I doubt that I synthesis training text images in a wrong way. I will try some other training images.
Again, Thanks very very much.
I calculate the background colour of the dataset image and fill the empty space with this colour.
How do you do this, can you explain? @githubharald
@githubharald Thanks very much. Your idea is very helpful for me. I have trained the model following you steps. And the model do well on simple background images, especially documents.
But works bad on other images, such as 'dogoodfor', the whitespace can not recognize. The words recognize very well.
I wander if the training data is not proper. I synthesis text with https://github.com/ankush-me/SynthText. Firstly, synthesis English single words 2,000,000 Secondly, sythesis English short sentences 2,000,000. The short sentences come for three English novels, and every short sentence has most three words and two spaces. The number of distinct show sentences is 50,000.
Do you any idea why some whitespace can not recognize?
I met with the same issue as you did. Did you resolve this issue?
The pretrained CRNN is only trained without white space, so kindly fine tune it with white space samples
I use the original CRNN,with 400*32 input image. I generate training data with real whitespace,but replace the whitespace with another char in the label file, then training, after testing,will replace the char back to whitespace. Also how many words and whitespaces do you have in one text line training image?
Please share your experience... how many words per line gave good results?
Hello,
I'm wondering if it is possible to work on complete lines instead of just on a word level. It can be hard to segment words in a line as a preprocessing step (depending on how "nice" the handwritten text looks). So if adding a new class, which I call "real blank" (not the CTC pseudo blank), then it should in theory be possible. Has anyone already tried this? In my experience so far, CTC works best with short sequences, but a line can be pretty long. Or is there some other way to avoid the preprocessing step of word segmentation?
How did you do this with the data from the ICFHR2016 Competition (*) in which CRNN was used?
(*) https://scriptnet.iit.demokritos.gr/competitions/4/