Belval / pdf2image

A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
MIT License
1.6k stars 194 forks source link

Error: /ioerror in --image-- #222

Closed orbanbalage closed 1 year ago

orbanbalage commented 2 years ago

Describe the bug pdf2image errors out instead of completing process (some PDFs work, some don't)

➜  Downloads pdf2image output.pdf
Page-1
Page-2
Error: /ioerror in --image--
Operand stack:

Execution stack:
   %interp_exit   .runexec2   --nostringval--   image   --nostringval--   2   %stopped_push   --nostringval--   image   image   false   1   %stopped_push   1990   1   3   %oparray_pop   1989   1   3   %oparray_pop   1977   1   3   %oparray_pop   1833   1   3   %oparray_pop   --nostringval--   %errorexec_pop   .runexec2   --nostringval--   image   --nostringval--   2   %stopped_push   --nostringval--   image   1864   1   7   %oparray_pop
Dictionary stack:
   --dict:734/1123(ro)(G)--   --dict:0/20(G)--   --dict:76/200(L)--   --dict:65/75(L)--   --dict:18/25(L)--   --dict:0/15(L)--   --dict:0/15(L)--
Current allocation mode is local
Last OS error: No such file or directory
Current file position is 34815
GPL Ghostscript 9.54.0: Unrecoverable error, exit code 1
Error: Failed to launch Ghostscript!

Desktop (please complete the following information):

Belval commented 2 years ago

Can you provide a sample PDF to reproduce the issue? This seems like a poppler/ghostscript issue and not a pdf2image one. Unfortunately I can't really fix bugs in poppler as I have no visibility on the library.

orbanbalage commented 2 years ago

Sorry, I thought I attached the file.

Indeed gs found some issues with the file, but even after fixing it the issue remains.

gs -dNOPAUSE -dBATCH -sDEVICE=nullpage output.pdf -sOutputFile=output-fix.pdf
GPL Ghostscript 9.54.0 (2021-03-30)
Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
   **** Warning:  File has an invalid xref entry:  2.  Rebuilding xref table.
Processing pages 1 through 2.
(...)
   **** This file had errors that were repaired or ignored.
   **** The file was produced by:
   **** >>>> itext-paulo-155 (itextpdf.sf.net - lowagie.com) <<<<
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

Command to fix the file:

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output-fix.pdf output.pdf

Verify:

gs -dNOPAUSE -dBATCH -sDEVICE=nullpage output-fix.pdf
GPL Ghostscript 9.54.0 (2021-03-30)
Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 2.
Page 1
Page 2

I wanted to check out the file with poppler, but I don't remember how I installed pdf2image, and there were some conflicts in brew, so I ended up uninstalling it, and just installing xpdf, and using:

pdfimages output.pdf output-images

which works.

Perhaps there are no images in the file at all and that is the problem? Xpdf does make images out of the pages, which is what I wanted I think.

Here is the file if you wanted to check on your end.

output.pdf