AmitGorvadiya / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

nobatch digits - tessedit_char_whitelist ignored #196

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Invoke tesseract on a sample of digits only, without the "nobatch
digits" option.
2. Some of the digits may be recognised as letters. In my sample, 0 was
recognised as D.
3. Edit the configs/digits file. It currently reads
"tessedit_char_whitelist 0123456789".
4. Change it to read "tessedit_char_whitelist D123456789".
5. Invoke tesseract on your sample of digits only.

What is the expected output? What do you see instead?
I would expect to see all zeros from my sample interpreted as "D". Instead,
I get a "0", which happens to be correct but there is no "0" in my whitelist.

Why is this a problem?
It seems that the whitelist is being ignored completely. This also appeared
to be the case when I added configs/capitals and invoked with "nobatch
capitals" which contained ABCDEFG... The output wasn't restricted to the
whitelist.

What version of the product are you using? On what operating system?
2.01 on Windows Vista

Original issue reported on code.google.com by paulfeak...@gmail.com on 19 Mar 2009 at 10:44

GoogleCodeExporter commented 9 years ago
Actually I've downloaded the latest version from the trunk and compiled with 
VC++
Express 2008.

It seems that after I have invoked the command, whatever changes I have made to 
my
digits file are overwritten and it reverts back to:
"tessedit_char_whitelist 0123456789"

Any ideas?
Paul.

Original comment by paulfeak...@gmail.com on 19 Mar 2009 at 1:10

GoogleCodeExporter commented 9 years ago
I've tested on Ubuntu 8.10 with a patched version 2.03 - everything works 
perfectly.

It also works on Vista if you forget "nobatch digits" and use:
TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789/");

Seems like tesseract on Vista doesn't read external config files correctly 
though?

Original comment by paulfeak...@gmail.com on 20 Mar 2009 at 11:31

GoogleCodeExporter commented 9 years ago
I am changing status to fixed: I test it on Windows XP with tesseract 3.01 and 
it (tesseract eurotext.tif eurotext digits) works as expected.

Original comment by zde...@gmail.com on 19 Feb 2012 at 3:35