gnewtothis101 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Access Violation - reading outside image buffer during line detection #1496

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.  Tesseract 3.02+ command line
2.  "tesseract -l eng Image_crop.png Image pdf"

What is the expected output? What do you see instead?
>  I expect tesseract to run and produce output

> Instead, Tesseract crashes with "ACCESS VIOLATION (0xC0000005)"-type error.

What version of the product are you using? On what operating system?
Seen in Tesseract 3.02.02 and code from SVN around March 2015.
Windows 7
Win32-bit Tesseract builds.

Please provide any additional information below.
- Doesn't happen in 64-bit Windows build (lucky?)

- Attached image has non-white pixels at image edges - this seems to trigger 
this crash bug.

- Access violation occurs in TextlineProjection::MeanPixelsInLineSegment() when 
it calls GET_DATA_BYTE() (~line 550).  This can break when start_pt/end_pt Y 
values = 0 and offset is a negative value.  This can also break when 
start_pt/end_pt Y value = bottom of image and offset is a positive value.  
These conditions lead to an attempted reads of data either before or after the 
image buffer.

- Other problems would occur horizontally (i.e. X value = 0 or right edge of 
image).  In these cases there is less chance of stepping outside the image 
buffer (unless at a corner), but good chance that the algorithm will not read 
the intended data due to wrapping to other image side.

Original issue reported on code.google.com by rtaylor...@gmail.com on 8 Jul 2015 at 10:53

Attachments: