option pdf: many problems with the current master branch version

GoogleCodeExporter commented 9 years ago

I am experiencing many problems with the present master version.

* Memory problems
* misrendered PDFs (if the png input, a screenshot, has a colored background)
* re-converting pdf to png - using convert - says "Warning: File has 
insufficient data for an image … Please notify the author of the software 
that produced this file that it does not conform to Adobe's published PDF 
specification"

I went back to my version 
https://code.google.com/p/tesseract-ocr/source/detail?r=bce2cd5f331b66ee6b793c66
6d1701063473053e . This works very well.

If you heavily working on the master version, I regard the problems as 
transient problems, but wanted to inform you that present and recent versions 
have problems with the "pdf" option. Pls. contact me, if you need further 
information. if needed, and if you use that: IRC chat?

Original issue reported on code.google.com by syr...@gmail.com on 20 Sep 2014 at 7:24

Merged into: #1300

GoogleCodeExporter commented 9 years ago

The attached png file (created with "convert") crashes tesseract version

* 
https://code.google.com/p/tesseract-ocr/source/detail?r=bce2cd5f331b66ee6b793c66
6d1701063473053e 

and also recent master
* 
https://code.google.com/p/tesseract-ocr/source/detail?r=9e8629d9effdde852c7bba86
2ba584442e7c5d93

command:
tesseract test.png test pdf

tesseract test.png test pdf
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Info in pixReadStreamPng: converting (gray + alpha) ==> RGBA
Info in readHeaderMemPng: gray + alpha: will extract as RGBA (spp = 4)
Error during processing.

Original comment by syr...@gmail.com on 20 Sep 2014 at 9:00

Attachments:

test.png

GoogleCodeExporter commented 9 years ago

Original comment by zde...@gmail.com on 21 Sep 2014 at 2:47

Changed state: Duplicate

ecit241 / tesseract-ocr

option pdf: many problems with the current master branch version #1321