clovaai / deep-text-recognition-benchmark

Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Apache License 2.0
3.77k stars 1.11k forks source link

How can i assign a token for all the character that are not in opt.character #282

Open mjack3 opened 3 years ago

mjack3 commented 3 years ago

Hello comunity. I am working in a project to recognize just numbers

My recognizer is trained with "0123456789" characters Moreover, I created a pipeline with a text detector + deep-text-recognition-benchmark-model to do OCR task.

When the text has just numbers it works ok, but when the text detector capture non numeric information my deep-text-recognition-benchmark-model assign numbers to the letters (has sense).

For instance, in the next text captured "H23", i would like to get something like "[UNK]23" or "_23" instead of "723"

Could you give me a suggestion?

GTHell commented 3 years ago

The problem lies with the text detector. If you want to recognize only number then the input from text detector should only detect the number box. Why don't you just train on the English text and do a post-processing?