fritz-hh / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
260 stars 31 forks source link

AttributeError: 'module' object has no attribute 'useA85' #76

Closed faulpaul closed 10 years ago

faulpaul commented 10 years ago

I installed OCRmyPDF and Tesseract (latest version from Google sources) on my system, but when converting a pdf ( http://www.sentryfile.com/forum/attachments//ImageOnly.pdf) it failes with this message:

[paul@host OCRmyPDF-2.0-stable]$ ./OCRmyPDF.sh -g ../ImageOnly.pdf ../test.pdf OCRmyPDF version: v2.0-stable Arguments: -g ../ImageOnly.pdf ../test.pdf

Checking if all dependencies are installed

ImageMagick version: Version: ImageMagick 6.5.4-7 2014-02-10 Q16 OpenMP http://www.imagemagick.org Copyright: Copyright (C) 1999-2009 ImageMagick Studio LLC


GNU Parallel version: WARNING: YOU ARE USING --tollef. IF THINGS ARE ACTING WEIRD USE --gnu. GNU parallel 20130522 Copyright (C) 2007,2008,2009,2010,2011,2012,2013 Ole Tange and Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. GNU parallel comes with no warranty.

Web site: http://www.gnu.org/software/parallel

When using GNU Parallel for a publication please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool,

;login: The USENIX Magazine, February 2011:42-47.

Poppler-utils version: pdfimages version 0.12.4 Copyright 2005-2009 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2004 Glyph & Cog, LLC pdftoppm version 0.12.4 Copyright 2005-2009 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2004 Glyph & Cog, LLC pdffonts version 0.12.4 Copyright 2005-2009 The Poppler Developers - http://poppler.freedesktop.org

Copyright 1996-2004 Glyph & Cog, LLC

unpaper version:

0.3

tesseract version: tesseract 3.02.02 leptonica-1.70 libjpeg 6b : libpng 1.2.49 : libtiff 3.9.4 : zlib 1.2.3


python2 version:

Python 2.6.6

Ghostscript version:

8.70

Java version: java version "1.5.0" gij (GNU libgcj) version 4.4.7 20120313 (Red Hat 4.4.7-4)

Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Created temporary folder: "/tmp/tmp.Tvo4xfNRjd" Input file: Extracting size of each page (in pt) Traceback (most recent call last): File "./src/hocrTransform.py", line 282, in hocr.to_pdf(args.outputfile, args.image, args.boundingboxes) File "./src/hocrTransform.py", line 265, in to_pdf pdf.drawInlineImage(im, 0, 0, width=self.width, height=self.height) File "/usr/lib64/python2.6/site-packages/reportlab/pdfgen/canvas.py", line 598, in drawInlineImage img_obj = PDFImage(image, x,y, width, height) File "/usr/lib64/python2.6/site-packages/reportlab/pdfgen/pdfimages.py", line 41, in init self.getImageData() File "/usr/lib64/python2.6/site-packages/reportlab/pdfgen/pdfimages.py", line 154, in getImageData imagedata, imgwidth, imgheight = self.PIL_imagedata() File "./src/hocrTransform.py", line 82, in PIL_imagedata (imgwidth, imgheight, bpc, colorSpace, rl_config.useA85 and '/A85 ' or '')] AttributeError: 'module' object has no attribute 'useA85' Processing page 0001 / 0001 Page 0001: Size 792x612 (h*w in pt) Page 0001: Size 3501x2495 (in pixel) Page 0001: Extracting image as pbm file (306 dpi) Page 0001: Performing OCR Page 0001: Embedding text in PDF Could not create PDF file from "/tmp/tmp.Tvo4xfNRjd/0001.hocr". Exiting... parallel: Starting no more jobs. Waiting for 1 jobs to finish. This job failed: ./src/ocrPage.sh /home/paul/ImageOnly.pdf 0001\ 612\ 792 0001 /tmp/tmp.Tvo4xfNRjd 3 eng 1 0 0 0 0 1 '' 0

maybe it some dependency error, but I am not sure.

Paul

segro21 commented 10 years ago

hi, Great Project! but i get the same Error, is there any chance to get it working on Centos-6 System? AttributeError: 'module' object has no attribute 'useA85'

[root@ocr OCRmyPDF-2.x]# ./OCRmyPDF.sh -g -l deu img-140904112435.pdf img-140904112435-ocr.pdf OCRmyPDF version: v2.0-stable Arguments: -g -l deu img-140904112435.pdf img-140904112435-ocr.pdf

Checking if all dependencies are installed

ImageMagick version: Version: ImageMagick 6.5.4-7 2014-02-10 Q16 OpenMP http://www.imagemagick.org Copyright: Copyright (C) 1999-2009 ImageMagick Studio LLC


GNU Parallel version: WARNING: YOU ARE USING --tollef. IF THINGS ARE ACTING WEIRD USE --gnu. GNU parallel 20130522 Copyright (C) 2007,2008,2009,2010,2011,2012,2013 Ole Tange and Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. GNU parallel comes with no warranty.

Web site: http://www.gnu.org/software/parallel

When using GNU Parallel for a publication please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool,

;login: The USENIX Magazine, February 2011:42-47.

Poppler-utils version: pdfimages version 0.12.4 Copyright 2005-2009 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2004 Glyph & Cog, LLC pdftoppm version 0.12.4 Copyright 2005-2009 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2004 Glyph & Cog, LLC pdffonts version 0.12.4 Copyright 2005-2009 The Poppler Developers - http://poppler.freedesktop.org

Copyright 1996-2004 Glyph & Cog, LLC

unpaper version:

0.3

tesseract version: tesseract 3.02.02 leptonica-1.69 libgif 4.1.6 : libjpeg 6b : libpng 1.2.49 : libtiff 3.9.4 : zlib 1.2.3


python2 version:

Python 2.6.6

Ghostscript version:

8.70

Java version: java version "1.7.0_65" OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)

OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

Created temporary folder: "/tmp/tmp.vA3gaG3Cyi" Input file: Extracting size of each page (in pt) parallel: Starting no more jobs. Waiting for 2 jobs to finish. This job failed: ./src/ocrPage.sh /home/OCRmyPDF-2.x/img-140904112435.pdf 0002\ 595\ 842 0003 /tmp/tmp.vA3gaG3Cyi 3 deu 1 0 0 0 0 1 '' 0 Traceback (most recent call last): File "./src/hocrTransform.py", line 286, in hocr.to_pdf(args.outputfile, args.image, args.boundingboxes) File "./src/hocrTransform.py", line 269, in to_pdf pdf.drawInlineImage(im, 0, 0, width=self.width, height=self.height) File "/usr/lib64/python2.6/site-packages/reportlab/pdfgen/canvas.py", line 598, in drawInlineImage img_obj = PDFImage(image, x,y, width, height) File "/usr/lib64/python2.6/site-packages/reportlab/pdfgen/pdfimages.py", line 41, in init self.getImageData() File "/usr/lib64/python2.6/site-packages/reportlab/pdfgen/pdfimages.py", line 154, in getImageData imagedata, imgwidth, imgheight = self.PIL_imagedata() File "./src/hocrTransform.py", line 86, in PIL_imagedata (imgwidth, imgheight, bpc, colorSpace, rl_config.useA85 and '/A85 ' or '')] AttributeError: 'module' object has no attribute 'useA85' Processing page 0001 / 0003 Page 0001: Size 842x595 (h_w in pt) Page 0001: Size 3508x2480 (in pixel) Page 0001: Extracting image as pgm file (300 dpi) Page 0001: Performing OCR Page 0001: Embedding text in PDF Could not create PDF file from "/tmp/tmp.vA3gaG3Cyi/0001.hocr". Exiting... Traceback (most recent call last): File "./src/hocrTransform.py", line 286, in hocr.to_pdf(args.outputfile, args.image, args.boundingboxes) File "./src/hocrTransform.py", line 269, in to_pdf pdf.drawInlineImage(im, 0, 0, width=self.width, height=self.height) File "/usr/lib64/python2.6/site-packages/reportlab/pdfgen/canvas.py", line 598, in drawInlineImage img_obj = PDFImage(image, x,y, width, height) File "/usr/lib64/python2.6/site-packages/reportlab/pdfgen/pdfimages.py", line 41, in init self.getImageData() File "/usr/lib64/python2.6/site-packages/reportlab/pdfgen/pdfimages.py", line 154, in getImageData imagedata, imgwidth, imgheight = self.PIL_imagedata() File "./src/hocrTransform.py", line 86, in PIL_imagedata (imgwidth, imgheight, bpc, colorSpace, rl_config.useA85 and '/A85 ' or '')] AttributeError: 'module' object has no attribute 'useA85' Processing page 0002 / 0003 Page 0002: Size 842x595 (h_w in pt) Page 0002: Size 3508x2480 (in pixel) Page 0002: Extracting image as pgm file (300 dpi) Page 0002: Performing OCR Page 0002: Embedding text in PDF Could not create PDF file from "/tmp/tmp.vA3gaG3Cyi/0002.hocr". Exiting... parallel: Starting no more jobs. Waiting for 1 jobs to finish. This job failed: ./src/ocrPage.sh /home/OCRmyPDF-2.x/img-140904112435.pdf 0001\ 595\ 842 0003 /tmp/tmp.vA3gaG3Cyi 3 deu 1 0 0 0 0 1 '' 0

fritz-hh commented 10 years ago

b28ff40aea81cf0059ab274faf980e3a56b23f3a should solve the issue. Please check (download: https://github.com/fritz-hh/OCRmyPDF/archive/5f173e5acb42b8bc594e3b8b3d5c9b42b5b4ea68.zip) and close the issue if solved. Thanks.

segro21 commented 10 years ago

Hi fritz, thank you very much, that solves the issue on Centos-6. you can close the issue.