colemana / PyPDF2

A utility to read and write pdfs with Python. Superseded: see https://github.com/knowah/PyPDF2
Other
80 stars 19 forks source link

getNumPages fails on encrypted PDF #1

Open arnaudbos opened 11 years ago

arnaudbos commented 11 years ago

I'm not an expert on the PDF file format but I think that PDF files contains a "/Page" instruction for each page in it, and this is visible even if the file is protected.

Also, there is the "/Type /Pages" instruction that give a "/Count" of the number of pages of the document that is visible even on a protected file too.

So why is the getNumPages method so complicated? What am I missing?

knowah commented 11 years ago

You are correct. This is the most logical answer, since PDF's are required to include a root Document Catalog, which in turn is required to have a Page Tree dictionary which contains the number of pages (current as of PDF 1.7). getNumPages() should now work with encrypted PDF's.