baopham1340 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Devanagari - dandaa misrecognized in some cases #1330

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. run tesseract on attached images with devanagari traineddata
2.
3.

What is the expected output? What do you see instead?
I expect 'Unicode Character 'DEVANAGARI DANDA' (U+0964)' to be recognized 
correctly. In some cases it is being recognized as ' and in others as Unicode 
Character 'DEVANAGARI VOWEL SIGN AA' (U+093E)

What version of the product are you using? On what operating system

latest version from git
on windows8 under msys2

Please provide any additional information below.

Files attached.
.tif are original images
.png have the error marked with red circle
.txt has teh recognized text

Original issue reported on code.google.com by shreeshrii on 8 Oct 2014 at 4:06

Attachments: