ecit241 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Hindi - modification to hin.config file #1355

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. currently 
https://code.google.com/p/tesseract-ocr/source/browse/hin/hin.config?repo=langda
ta has
# Avoid false positive 1 in place of the Hindi | character
tessedit_char_blacklist 1
2. Hindi also uses the arabic numbers from 0-9, so the number 1 will not get 
recognized if the above is in the config file.
3.

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?

Please provide any additional information below.

Original issue reported on code.google.com by shreeshrii on 25 Oct 2014 at 1:09

GoogleCodeExporter commented 9 years ago
This can't be addressed until we can retrain Hindi. Punting until 3.05.

Original comment by theraysm...@gmail.com on 4 Nov 2014 at 10:30

GoogleCodeExporter commented 9 years ago
Is the problem just with training Hindi or all devanagari based scripts?

Does this mean that there will be no new traineddata for Hindi in 3.04?

Original comment by shreeshrii on 5 Nov 2014 at 4:39

GoogleCodeExporter commented 9 years ago
Moved to github: https://github.com/tesseract-ocr/langdata/pull/7

Original comment by joregan on 13 May 2015 at 6:21