This is a patch implementing a feature I described on the tesseract-dev mailing
list here:
https://groups.google.com/d/msgid/tesseract-dev/20140501195706.GB31269%40manta.l
an
It adds a variable tessedit_char_unblacklist, that re-enables characters that
have been blacklisted using tessedit_char_blacklist.
This is useful for situations where generally a character should be blacklisted
as it is uncommon, but the option to enable it would be very handy in some
cases. An example would be the Greek numeral character (ʹ) that can easily be
mistaken for an apostrophe (’). Most Ancient Greek books don't use the Greek
numeral character, so it would be useful to disable it in the grc.config using
tessedit_char_blacklist, and allow users to enable it using
tessedit_char_unblacklist if they know they're OCRing a text that is likely to
use it.
Original issue reported on code.google.com by nick.wh...@durham.ac.uk on 22 May 2014 at 8:27
Original issue reported on code.google.com by
nick.wh...@durham.ac.uk
on 22 May 2014 at 8:27Attachments: