charlesw / tesseract

A .Net wrapper for tesseract-ocr
Apache License 2.0
2.28k stars 744 forks source link

Errors when numeric and alphabetic data is mixed #454

Open ilochray opened 5 years ago

ilochray commented 5 years ago

I am using the API to read data from an image. I have created training files for the fonts I process and I pre-process the image to deskew and clean it. When I read entirely numeric data it reads perfectly e.g. 123456. When I read entirely alphabetic data it reads perfectly e.g. ABCDEFGH. The problem arises when I try to read text where the two are combined e.f. 12ABC3456. In this case, there are lots of errors (B and 8 mixed up for example).

ilochray commented 5 years ago

I have tried setting load_system_dawg and load_freq_dawg to be false but that did not help. Are there any other configuration changes I can make to help?