cocogua / tesseractdotnet

Automatically exported from code.google.com/p/tesseractdotnet
0 stars 0 forks source link

whitelist being ignored #15

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
var _ocr = new TesseractProcessor();
_ocr.SetPageSegMode(ePageSegMode.PSM_SINGLE_CHAR);
_ocr.SetVariable("tessedit_char_whitelist", 
"ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");          
_ocr.Init(Program.AppPath + "tessdata\\", "eng", 
(int)Enums.EOcrEngineMode.TesseractOnly);

-- expecting only alphanumeric output, i'm getting all kinds of weird characters

!*&() etc

I've read that the blacklist overrides the whitelist if the blacklist is null 
or empty... does this mean that the whitelist is ignored if the blacklist isn't 
specified?

What version of the product are you using? On what operating system?
r591 on windows 7

Please provide any additional information below.

also need to get at the confidence level for the characters...

Original issue reported on code.google.com by sean.p.t...@gmail.com on 24 Aug 2011 at 5:52

GoogleCodeExporter commented 9 years ago
I know that tesseract itself is acting like your code, when it first needs all 
parameters like whitelist and pagesegmode, but seems that tesseractdotnet needs 
this to be the other way around. My own solution does first init and then sets 
all necessary parameters. So for having an effect on these parameters your code 
should look like this:

var _ocr = new TesseractProcessor();
_ocr.Init(Program.AppPath + "tessdata\\", "eng", 
(int)Enums.EOcrEngineMode.TesseractOnly);
_ocr.SetPageSegMode(ePageSegMode.PSM_SINGLE_CHAR);
_ocr.SetVariable("tessedit_char_whitelist", 
"ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");

Good luck!          

Original comment by tanelte...@gmail.com on 24 Aug 2011 at 6:30

GoogleCodeExporter commented 9 years ago
Thanks! That did the trick.

Original comment by sean.p.t...@gmail.com on 25 Aug 2011 at 4:47