ecit241 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Tesseract OCR force pattern #1500

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Follow the bazaar tutorial
2. Test with simple image and pattern TEST/A/A/d/d/d
3. No filter at the result

What is the expected output? What do you see instead?

Expected : TESTAB123

See : TESTAB123
      TESTABC12
      TESTA1234
      TEST12345
      TESTABCD1

What version of the product are you using? On what operating system?
Tesseract 3
Windows 8

I want to read a specific character sequence with Tesseract wich contains the 
word "TEST" followed by 2 characters and 3 digits.

I have tried bazaar matching pattern in Tesseract with the pattern

TEST\A\A\d\d\d

and ocr still recognize other words which doesn't match.

I have tried to use the "tessedit_char_whitelist" parameter but I can't choose 
the position of the characters with that.

I launch the command : tesseract image.jpg result -l eng bazaar And I have no 
error message, just :

"Tesseract Open Source OCR Engine v3.01 with Leptonica"

The result : TESTAB123 TESTABC12 TESTA1234 TEST12345 TESTABCD1

So it is wrong, I just wanted to catch the sequence "TESTAB123".

Can somebody tell me why the regular expression in my user-patterns file as no 
effect ? For the configuration, I have STRICTLY followed the bazaar tutorial.

Original issue reported on code.google.com by leopold....@gmail.com on 10 Aug 2015 at 7:23

Attachments: