Open GoogleCodeExporter opened 9 years ago
Another sample page with one word recognized differently - with psm 6
accurate recognition is 'नामावलिः'
recognized text includes that and many other variations such as:
नामावतिःन्
नामावलिः>
नामावळिः
नामावतिः
नामावठिः
नामावठिः '
Original comment by shreeshrii
on 12 Oct 2014 at 3:18
Attachments:
Does it happen when you create a unicharambigs with those comparisons between
erroneous outputs?
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
The last file (unicharambigs)
The final data file that Tesseract uses is called unicharambigs. It describes
possible ambiguities between characters or sets of characters, and is manually
generated. To understand the file format, look at the following example:
Example line Explanation
2 ' ' 1 " 1 A double quote (") should be substituted whenever 2 consecutive
single quotes (') are seen.
1 m 2 r n 0 The characters 'rn' may sometimes be recognized incorrectly as 'm'.
3 i i i 1 m 0 The character 'm' may sometimes be recognized incorrectly as the
sequence 'iii'.
Original comment by dalbirsi...@googlemail.com
on 4 Feb 2015 at 11:28
Original issue reported on code.google.com by
shreeshrii
on 10 Oct 2014 at 8:21Attachments: