[Feature] Add tessedit_char_unblacklist variable

This is a patch implementing a feature I described on the tesseract-dev mailing 
list here: 
https://groups.google.com/d/msgid/tesseract-dev/20140501195706.GB31269%40manta.l
an

It adds a variable tessedit_char_unblacklist, that re-enables characters that 
have been blacklisted using tessedit_char_blacklist.

This is useful for situations where generally a character should be blacklisted 
as it is uncommon, but the option to enable it would be very handy in some 
cases. An example would be the Greek numeral character (ʹ) that can easily be 
mistaken for an apostrophe (’). Most Ancient Greek books don't use the Greek 
numeral character, so it would be useful to disable it in the grc.config using 
tessedit_char_blacklist, and allow users to enable it using 
tessedit_char_unblacklist if they know they're OCRing a text that is likely to 
use it.

Original issue reported on code.google.com by nick.wh...@durham.ac.uk on 22 May 2014 at 8:27

Attachments:

unblacklist.patch

AiPacino / tesseract-ocr

[Feature] Add tessedit_char_unblacklist variable #1207