Closed faulpaul closed 10 years ago
hi, Great Project! but i get the same Error, is there any chance to get it working on Centos-6 System? AttributeError: 'module' object has no attribute 'useA85'
[root@ocr OCRmyPDF-2.x]# ./OCRmyPDF.sh -g -l deu img-140904112435.pdf img-140904112435-ocr.pdf OCRmyPDF version: v2.0-stable Arguments: -g -l deu img-140904112435.pdf img-140904112435-ocr.pdf
ImageMagick version: Version: ImageMagick 6.5.4-7 2014-02-10 Q16 OpenMP http://www.imagemagick.org Copyright: Copyright (C) 1999-2009 ImageMagick Studio LLC
GNU Parallel version: WARNING: YOU ARE USING --tollef. IF THINGS ARE ACTING WEIRD USE --gnu. GNU parallel 20130522 Copyright (C) 2007,2008,2009,2010,2011,2012,2013 Ole Tange and Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. GNU parallel comes with no warranty.
Web site: http://www.gnu.org/software/parallel
When using GNU Parallel for a publication please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
Poppler-utils version: pdfimages version 0.12.4 Copyright 2005-2009 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2004 Glyph & Cog, LLC pdftoppm version 0.12.4 Copyright 2005-2009 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2004 Glyph & Cog, LLC pdffonts version 0.12.4 Copyright 2005-2009 The Poppler Developers - http://poppler.freedesktop.org
unpaper version:
tesseract version: tesseract 3.02.02 leptonica-1.69 libgif 4.1.6 : libjpeg 6b : libpng 1.2.49 : libtiff 3.9.4 : zlib 1.2.3
python2 version:
Ghostscript version:
Java version: java version "1.7.0_65" OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)
Created temporary folder: "/tmp/tmp.vA3gaG3Cyi"
Input file: Extracting size of each page (in pt)
parallel: Starting no more jobs. Waiting for 2 jobs to finish. This job failed:
./src/ocrPage.sh /home/OCRmyPDF-2.x/img-140904112435.pdf 0002\ 595\ 842 0003 /tmp/tmp.vA3gaG3Cyi 3 deu 1 0 0 0 0 1 '' 0
Traceback (most recent call last):
File "./src/hocrTransform.py", line 286, in
b28ff40aea81cf0059ab274faf980e3a56b23f3a should solve the issue. Please check (download: https://github.com/fritz-hh/OCRmyPDF/archive/5f173e5acb42b8bc594e3b8b3d5c9b42b5b4ea68.zip) and close the issue if solved. Thanks.
Hi fritz, thank you very much, that solves the issue on Centos-6. you can close the issue.
I installed OCRmyPDF and Tesseract (latest version from Google sources) on my system, but when converting a pdf ( http://www.sentryfile.com/forum/attachments//ImageOnly.pdf) it failes with this message:
[paul@host OCRmyPDF-2.0-stable]$ ./OCRmyPDF.sh -g ../ImageOnly.pdf ../test.pdf OCRmyPDF version: v2.0-stable Arguments: -g ../ImageOnly.pdf ../test.pdf
Checking if all dependencies are installed
ImageMagick version: Version: ImageMagick 6.5.4-7 2014-02-10 Q16 OpenMP http://www.imagemagick.org Copyright: Copyright (C) 1999-2009 ImageMagick Studio LLC
GNU Parallel version: WARNING: YOU ARE USING --tollef. IF THINGS ARE ACTING WEIRD USE --gnu. GNU parallel 20130522 Copyright (C) 2007,2008,2009,2010,2011,2012,2013 Ole Tange and Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. GNU parallel comes with no warranty.
Web site: http://www.gnu.org/software/parallel
When using GNU Parallel for a publication please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
Poppler-utils version: pdfimages version 0.12.4 Copyright 2005-2009 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2004 Glyph & Cog, LLC pdftoppm version 0.12.4 Copyright 2005-2009 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2004 Glyph & Cog, LLC pdffonts version 0.12.4 Copyright 2005-2009 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2004 Glyph & Cog, LLC
unpaper version:
0.3
tesseract version: tesseract 3.02.02 leptonica-1.70 libjpeg 6b : libpng 1.2.49 : libtiff 3.9.4 : zlib 1.2.3
python2 version:
Python 2.6.6
Ghostscript version:
8.70
Java version: java version "1.5.0" gij (GNU libgcj) version 4.4.7 20120313 (Red Hat 4.4.7-4)
Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Created temporary folder: "/tmp/tmp.Tvo4xfNRjd" Input file: Extracting size of each page (in pt) Traceback (most recent call last): File "./src/hocrTransform.py", line 282, in
hocr.to_pdf(args.outputfile, args.image, args.boundingboxes)
File "./src/hocrTransform.py", line 265, in to_pdf
pdf.drawInlineImage(im, 0, 0, width=self.width, height=self.height)
File "/usr/lib64/python2.6/site-packages/reportlab/pdfgen/canvas.py", line 598, in drawInlineImage
img_obj = PDFImage(image, x,y, width, height)
File "/usr/lib64/python2.6/site-packages/reportlab/pdfgen/pdfimages.py", line 41, in init
self.getImageData()
File "/usr/lib64/python2.6/site-packages/reportlab/pdfgen/pdfimages.py", line 154, in getImageData
imagedata, imgwidth, imgheight = self.PIL_imagedata()
File "./src/hocrTransform.py", line 82, in PIL_imagedata
(imgwidth, imgheight, bpc, colorSpace, rl_config.useA85 and '/A85 ' or '')]
AttributeError: 'module' object has no attribute 'useA85'
Processing page 0001 / 0001
Page 0001: Size 792x612 (h*w in pt)
Page 0001: Size 3501x2495 (in pixel)
Page 0001: Extracting image as pbm file (306 dpi)
Page 0001: Performing OCR
Page 0001: Embedding text in PDF
Could not create PDF file from "/tmp/tmp.Tvo4xfNRjd/0001.hocr". Exiting...
parallel: Starting no more jobs. Waiting for 1 jobs to finish. This job failed:
./src/ocrPage.sh /home/paul/ImageOnly.pdf 0001\ 612\ 792 0001 /tmp/tmp.Tvo4xfNRjd 3 eng 1 0 0 0 0 1 '' 0
maybe it some dependency error, but I am not sure.
Paul