AiPacino / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
2 stars 0 forks source link

[Feature] Add tessedit_char_unblacklist variable #1207

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This is a patch implementing a feature I described on the tesseract-dev mailing 
list here: 
https://groups.google.com/d/msgid/tesseract-dev/20140501195706.GB31269%40manta.l
an

It adds a variable tessedit_char_unblacklist, that re-enables characters that 
have been blacklisted using tessedit_char_blacklist.

This is useful for situations where generally a character should be blacklisted 
as it is uncommon, but the option to enable it would be very handy in some 
cases. An example would be the Greek numeral character (ʹ) that can easily be 
mistaken for an apostrophe (’). Most Ancient Greek books don't use the Greek 
numeral character, so it would be useful to disable it in the grc.config using 
tessedit_char_blacklist, and allow users to enable it using 
tessedit_char_unblacklist if they know they're OCRing a text that is likely to 
use it.

Original issue reported on code.google.com by nick.wh...@durham.ac.uk on 22 May 2014 at 8:27

Attachments:

GoogleCodeExporter commented 9 years ago
This issue was closed by revision f927728169cd.

Original comment by theraysm...@gmail.com on 9 Oct 2014 at 8:30