baopham1340 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Multi underline character problem #1430

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. create office file with multi underline Eg: "test ___ test"
2. export as pdf
3. convert pdf to tiff
4. run tesseract with tiff file

What is the expected output? What do you see instead?
the result is expected is "test ___ test" but it is "test test"

What version of the product are you using? On what operating system?
tesseract version:3.02
os: ubuntu 14.04

Please provide any additional information below.

Original issue reported on code.google.com by linhbm0...@gmail.com on 4 Mar 2015 at 9:01

GoogleCodeExporter commented 8 years ago
tesseract ignores graphics, so it ignores  ___ 

Original comment by zde...@gmail.com on 12 Apr 2015 at 4:02