ecit241 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Devanagari and Tamil - Recognition different for tam+san vs san+tam #1344

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. run tesseract with devanagari and tamil scripts traineddata on attached image
2.
3.

What is the expected output? What do you see instead?
The recognition is different based on whether san+tam is used or tam+san is used

What version of the product are you using? On what operating system?
latest version from git, msys2, windows 8

Please provide any additional information below.
tif input and recognized text for both options attached.

Original issue reported on code.google.com by shreeshrii on 15 Oct 2014 at 12:34

Attachments:

GoogleCodeExporter commented 9 years ago
This is an intended behavior.
The first specified language takes priority until the text changes to another, 
then there is hysteresis. It is highly imperfect, but reasonably efficient that 
way.
It will do better when the overall recognition accuracy is better.

Original comment by theraysm...@gmail.com on 4 Nov 2014 at 10:02