ehikz / pytesser

Automatically exported from code.google.com/p/pytesser
Other
0 stars 0 forks source link

How to get whitelist to work with pytesseract #23

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?

Trying to use the code that makes a whitelist for Tesseract like follows

ocr = tesseract.TessBaseAPI()
ocr.SetVariable("tessedit_char_whitelist", "0123456789;")
ocr.SetPageSegMode(tesseract.PSM_AUTO)
ocr.Init("C:\\Program Files (x86)\\Tesseract-OCR\\","eng",tesseract.OEM_DEFAULT)

What is the expected output? What do you see instead?

Intended output is to have only "0123456789;" characters be recognized when 
using the image_to_string() function.  Using code like what is above, 
image_to_string() just ignores it and grabs whatever characters it finds.

What version of the product are you using? On what operating system?

pytesseract-0.1, Python 2.7, Windows 8.1

Please provide any additional information below.

I've been trying everything people use for Tesseract-OCR, but that doesn't work 
with pytesseract.  I haven't been able to find any solution or method to 
whitelisting with the image_to_string() function anywhere, which would be 
immensely helpful in improving the accuracy of the function.

Thanks in advance for any help on the matter.

Original issue reported on code.google.com by darke...@yahoo.com on 9 Jun 2015 at 6:58