gnewtothis101 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Text gets chopped in the image before analysis #979

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
 0. Unzip buggydir.zip ; cd buggydir
 1. tesseract image.jpg image.txt (to generate output)
 2. tesseract image.jpg ignore segdemo inter (to view the reason for the buggy output)
(Note: to reproduce this step, needs the ScrollView debugger/viewer to be 
installed and used, as described in 
https://code.google.com/p/tesseract-ocr/wiki/ViewerDebugging, but to  just see 
the output, I have attached the relevant screenshots as described below).

What is the expected output? What do you see instead?
 Expected output: 
Completely analysed text. For example: SUBTOTAL   11.48
 Output seen:
(from command 1):
Image.txt has some characters chopped from the output. 
For example: SU 11.48 
(from command 2):
You can see in the ImageEditor window that the text is already chopped before 
analysis (or see attached image BuggyBoundingBlocks.png).
If you select menu option Other->Show Image (or see attached image 
BuggShowImage.png) you can see that the text that has got chopped is in faint 
black, and the rest, which is correctly recognized, has different coloured 
borders. This seems to be the root cause of the buggy output. Any idea what is 
the reason for this, and how it can be corrected?
What version of the product are you using? On what operating system?
 tesseract 3.02 on Ubuntu Linux 12.04 LTS 32-bit
 running on an Oracle Virtualbox VM configured with 2 GB memory
 host is Windows 7 64-bit with 4 GB physical RAM

Please provide any additional information below.
Resizing (reducing the size of) the input image seems to make a difference – 
sometimes fewer characters get chopped, sometimes a different set of characters 
get chopped - just in case this helps trace the root cause. 

Please help! Thanks in advance :)

Original issue reported on code.google.com by amitrao....@gmail.com on 6 Sep 2013 at 12:21

Attachments:

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r1034.

Original comment by theraysm...@gmail.com on 30 Jan 2014 at 2:21

GoogleCodeExporter commented 9 years ago
The issue is fixed, but the fix causes problems. More work required...

Original comment by theraysm...@gmail.com on 31 Jan 2014 at 3:11

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r1043.

Original comment by theraysm...@gmail.com on 3 Feb 2014 at 7:16