coherentgraphics / cpdf-binaries

PDF Command Line Tools binaries for Linux, Mac, Windows
GNU Affero General Public License v3.0
604 stars 42 forks source link

Should we ship out broken files to Ghostscript automatically? #37

Closed peavine closed 5 years ago

peavine commented 5 years ago

I've encountered an error when using cpdf to process one particular PDF that is seven years old. I've not encountered this error with any other PDF over several years of cpdf use, so, in many respects, this is a very minor issue. I guess the concern is that cpdf did not report an error which could be handled.

I run the following in a terminal window:

cpdf -squeeze /Users/peavine/Test/in.pdf -o /Users/peavine/Test/out.pdf

After running this command, the terminal shows line after line of "list length 0". To stop this I exit the terminal.

If I run the following command, I receive the exact same result:

cpdf -info /Users/peavine/Test/in.pdf

The PDF opens in Preview without issue, and the Preview info sheet for the PDF shows:

Document type: PDF document PDF version: 1.4 PDF Producer: Xenos D2eVision v2

BTW, the errant PDF contains financial records and, for this reason, I am unable to provide an example for troubleshooting.

Thanks.

Mac mini 2018 running Mojave with all updates cpdf Version 2.2 (patchlevel 1, build of 1st September 2017) under special not-for-commercial-use license

johnwhitington commented 5 years ago

The gold standard for fixing such malformed PDF files is ghostscript. I recommend preprocessing such malformed files (and only such files) with the following command as a workaround:

gs -sDEVICE=pdfwrite -o out.pdf in.pdf

Does that work?

peavine commented 5 years ago

I ran the PDF through Ghostscript and now the file works with cpdf. Thanks.

FWIW, the output from Ghostscript was:

GPL Ghostscript 9.27 (2019-04-04) Copyright (C) 2018 Artifex Software, Inc. All rights reserved. This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY: see the file COPYING for details. Processing pages 1 through 10. Page 1 Page 2 Loading NimbusMonoPS-Regular font from /usr/local/share/ghostscript/9.27/Resource/Font/NimbusMonoPS-Regular... 5325312 3978831 4504508 3063577 4 done. Page 3 Page 4 Page 5 Loading NimbusRoman-Bold font from /usr/local/share/ghostscript/9.27/Resource/Font/NimbusRoman-Bold... 5757328 4409341 3346072 1916433 4 done. Page 6 Page 7 Page 8 Page 9 Page 10

johnwhitington commented 5 years ago

Ok, it looks like Ghostscript didn't find a problem other than unembedded fonts, or at least the problem was so small it was not reported.

If you ever come across a file with the same problem which you can share, please send it.

johnwhitington commented 5 years ago

Note to self: add option to mend files with ghostscript automatically.

johnwhitington commented 5 years ago

Implemented for CPDF 2.3. When CPDF can't fix a file itself, it ships out to ghostscript:

cpdf -gs "gs" -gs-malformed .....