dylanroy / pdfsizeopt

Automatically exported from code.google.com/p/pdfsizeopt
0 stars 0 forks source link

Add decryption support #51

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Hi,

I checked out the latest revision (r167) of pdfsizeopt on my Debian squeeze 
system and ran it on the attached file `test.pdf'.  This file is *not* 
protected.

$ python pdfsizeopt.py --use-multivalent=false test.pdf

info: This is pdfsizeopt.py r166 size=279220.
info: loading PDF from: test.pdf
info: loaded PDF of 313456 bytes
info: separated to 131 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 0 Type1C fonts loaded
info: eliminated 21 duplicate objs
info: eliminated 3 unused objs in 3 classes
info: saving PDF with 107 objs to: test.pso.pdf
info: generated 303933 bytes (97%)

The output file however seems to be corrupted; various programs either cannot 
show it or demand some password:

- pdfinfo, xpdf (using libpoppler)
- GNOME evince (using libcairo)

Can you please have a look into this case.  Please let me know if some further 
information is needed.

Thanks a lot.
Mathias

Original issue reported on code.google.com by garbage-collection@gmx.net on 25 Aug 2011 at 2:29

GoogleCodeExporter commented 8 years ago
Thank you for taking time for submitting a useful bug report, with all details 
needed to reproduce the problem included.

The attached PDF is indeed encrypted, it's trailer contains /Encrypt.

Please note that the PDF file format supports encryption without a (user) 
password, so viewer software can decrypt it without asking for a password. This 
is what happens with the attached test.pdf .

pdfsizeopt at r166 doesn't support encrypted input at all. This means that it's 
behavior is undefined on encrypted input. The behavior for test.pdf is to 
finish successfully, but generate an invalid (useless, bad) output PDF, which 
confuses xpdf and evince. (Most probably there is no bug in xpdf and evince.)

As a quick fix, I've added the proper error message to r168 if the input PDF is 
encrypted, so pdfsizeopt should fail early and clearly. You can use the 
following command to decrypt the PDF before passing it to pdfsizeopt:

  qpdf --decrypt test.pdf test.decrypted.pdf

Adding encrypted file reading support to pdfsizeopt would be a huge effort, and 
I don't have time for that in a foreseeable future, but I'm keeping this issue 
open just to track the progress, should there be any.

Original comment by pts...@gmail.com on 25 Aug 2011 at 8:33

GoogleCodeExporter commented 8 years ago
> Please note that the PDF file format supports encryption without a (user) 
password, so viewer software can decrypt it without asking for a password. This 
is what happens with the attached test.pdf .

Sorry for my early claim that it's not "protected". I should have been able to 
become aware easily of the opposite using e.g. pdfinfo and not merely a pdf 
viewer.

Thanks you for your quick reaction and the bypass solution using qpdf in the 
new revision.

Please note, that I deleted the attached file test.pdf from my first post.  
Anyway it can be downloaded freely from
http://www.swp-berlin.org/de/produkte/swp-studien-de/swp-studien-detail/article/
private_militaerfirmen_im_voelkerrecht.html 

Original comment by garbage-collection@gmx.net on 25 Aug 2011 at 11:30

GoogleCodeExporter commented 8 years ago

Original comment by pts...@gmail.com on 2 Apr 2012 at 5:25

GoogleCodeExporter commented 8 years ago
Many users would find this useful, especially if it was automated manual 
decryption with qpdf.exe would not be necessary.

There are many external tools to decrypt PDFs:

* qpdf --decrypt (doesn't change anything else)
* Multivalent tool.pdf.Compress
* gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=unencrypted.pdf -c 
.setpdfwrite -f encrypted.pdf

, but some of them (e.g. Ghostscript and tool.pdf.Compress) do extra, unwanted 
transformations as well. We could also integrate decryption to pdfsizeopt, but 
it may be a bit slow.

Multivalent and qpdf are both able to decrypt PDFs. Maybe addn

Original comment by pts...@gmail.com on 2 Jul 2012 at 11:01

GoogleCodeExporter commented 8 years ago
qpdf --decrypt removed pdf-pseudo-encryption without unwanted changes in all of 
my test cases.
Multivalent and Ghostscript had unwanted side-effects and mutool often created 
damaged PDF files. So I stick with qpdf.

Original comment by Sebastia...@googlemail.com on 28 Jan 2014 at 1:12

GoogleCodeExporter commented 8 years ago
Thanks for the feedback. The newest version of pdfsizeopt recommends using qpdf 
--decrypt if needed. Does it show this message for your encrypted PDF?

It wouldn't be too hard to exectend pdfsizeopt to call qpdf --decrypt 
automatically by default (could be disabled by --use-qpdf-decrypt=no). Would 
you like to have this feature?

Original comment by pts...@gmail.com on 28 Jan 2014 at 2:53