Closed Viton-zizu closed 10 years ago
Try encoding the value as an ansii value using unicode escape sequences. I thought id done this automatically but the code must be in the 3.03 branch. On 11 Sep 2014 09:22, "Viton-zizu" notifications@github.com wrote:
This "SetVariable" not work, how i can do white list chars? engine.SetVariable("tessedit_char_whitelist", "АБВГД...etc");
— Reply to this email directly or view it on GitHub https://github.com/charlesw/tesseract/issues/120.
try this, not work engine.SetVariable("tessedit_char_whitelist", "\u0410"); "\u0410" = "А" russian letter
I have the same problems with Russian symbols. Transform to UTF-8 doesn't help. Version 3.03 doesn't help too.
But I think I know the solution. Check out this StackOverflow question: http://stackoverflow.com/questions/9794029/python-tesseract-ocr-get-digits-only
The line
SetVariable("tessedit_char_whitelist", someChars);
should be run before initializing.
In the previous version of the Tesseract wrapper, the initialization method was existed separately from constructor. So, I could do this:
engine = new TesseractEngine(@"./tessdata", "rus", EngineMode.Default)
engine.SetVariable("tessedit_char_whitelist", rusChars);
engine.Init();
But in the current version I can't do it because the initialization method was moved into the constructor. Please, fix it.
Okay, I'm going to have a look into this today. On 20 Sep 2014 00:34, "Andrey Akinshin" notifications@github.com wrote:
I have the same problems with Russian symbols. Transform to UTF-8 doesn't help. Version 3.03 doesn't help too.
— Reply to this email directly or view it on GitHub https://github.com/charlesw/tesseract/issues/120#issuecomment-56185115.
Same issue as Issue #68, I'll backport the fix from 3.03 and see if that helps.
It works now, thanks. Can you merge it into master branch and publish via NuGet?
Yes, I'll look at doing that tomorrow want to do a little more testing first as there's quite a few changes since last release. On 20 Sep 2014 18:35, "Andrey Akinshin" notifications@github.com wrote:
It works now, thanks. Can you merge it into master branch and publish via NuGet?
— Reply to this email directly or view it on GitHub https://github.com/charlesw/tesseract/issues/120#issuecomment-56261011.
Great!
This "SetVariable" not work, how i can do white list chars? engine.SetVariable("tessedit_char_whitelist", "АБВГД...etc");