Closed marcoromelli closed 5 years ago
Hey,
You're probably scanning the image without cropping it well first (which could be done with CRAFT). The LSTM is affected by the white space in between.
No CRAFT:
With CRAFT:
Note that that CRAFT reads the most upper of the two textboxes it finds first instead of the most leftern , meaning it starts in the middle this time. This can be manually fixed by sorting the coordinates on the 3rd .shape() parameter but that's not really that relevant to this question. Anyways just use CRAFT.
Hey,
You're probably scanning the image without cropping it well first (which could be done with CRAFT). The LSTM is affected by the white space in between.
No CRAFT:
With CRAFT:
Note that that CRAFT reads the most upper of the two textboxes it finds first instead of the most leftern , meaning it starts in the middle this time. This can be manually fixed by sorting the coordinates on the 3rd .shape() parameter but that's not really that relevant to this question. Anyways just use CRAFT.
Can you please tell me how to use CRAFT with this
@marcoromelli Hello, I guess that is the problem of training data (with over 80% of confidence). MJSynth and SynthText have little 'long numeric text'. You can download MJSynth and check whose vocabulary. and, the corpus of SynthText is made from News Group dataset, which has little 'long numeric text'. I didn't investigate them thoroughly, but I guess this is the reason why.
Thus, if you want to recognize 'long numeric texts' (which have the easier background and non-irregular shape), generate or gather some 'long numeric texts' and train the model with them. The model accuracy would be much better.
or, as @YacobBY did, use CRAFT to split long text (or just detect 2 boxes in 1 word) is also good way :)
Best.
@Sayyam-Jain I don't have the original code of normal integration anymore but the gist of what I did when I made my cropper is to take some of the poly values from the test_net method and cropped it to that. These were poly[1], poly[5], poly[0], poly[2] Note that the images are in the PILLOW format here
Thanks all for the contribution.
@YacobBY The image I attached is already the output of CRAFT from a bigger image, do you think applying CRAFT again on it makes sense in general? I'm not really sure of this. It is true that using some splitting strategy would help, but it's hard to implement something robust to different types of fonts and backgrounds styles.
@ku21fan I had some experience with a CRNN model trained on the same synthetic data which never showed issues with long sequences (it did show issues with curved texts), but you're right that this is probably an issue. I will try to generate some new data specific for long numeric sequences. What do you think could be a good number of images?
@marcoromelli for me CRAFT finds two separate textboxes directly instead of one big textbox like yours. The most important thing before when applying the OCR is that the image is cropped to the text as much as possible. You could also try to manually crop this image to exactly the text to see if you get better results
I added a simple test image to the demo folder and it gets wrongly recognized. The image: The output: 128456782012
The command I used is the same of the README. Thinking that the problem could have been related to the string length, I tried retraining the TPS-ResNet-BiLSTM-Attn model using an imgW of 200 pixels, but the problem seems to be very similar.
Any idea on why this happens? It seems to me that this image is much simpler compared to the other demo images.