galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams
http://www.pdfhummus.com
Other
1.14k stars 169 forks source link

Determine owner vs user password #271

Open amhspencer opened 6 years ago

amhspencer commented 6 years ago

Currently you can use pdfReader.isEncrypted() to determine if a PDF has encryption or not. Is there any way to determine if the encryption is due to an owner password, user password, or both?

galkahana commented 6 years ago

interesting one. im not sure there's a straightforward way. one thing of note: owner password = user password in case you "dont have an owner password" (it's not empty or something, that would be silly...cause you can provide empty password to open the doc as owner). so it's either that they are the same, or different, but they always exist.

what you could do to sort of guess that there's an interesting difference between owner and user, is to check the P key of the encryption dict (see section 3.5.2 of the pdf ref manual 1.7). if you have interesting stuff not permitted, good chance there's a different owner password that will allow those things.

hacking aside, why would you want to know that?

amhspencer commented 6 years ago

My use case is accepting PDFs where as long as we can open it and read it, it works, but we do not need to be able to edit or print or do any of the other actions controlled by access permissions. Think any situation where you apply to something and attach PDFs, where the reviewer will not need to do anything besides read it (college application, bank application). Thus, we are fine with encrypted pdfs if they only contain an owner password, but not if they contain a user password.

The P key does seem promising. It would even allow greater granularity -- say you need to open and read and possibly print, but never edit. Except then, seeing that the doc is encrypted and some permissions have been set and therefore there is an owner password, we cannot say whether or not there is also a user password required to open it.

cschwaderer commented 6 years ago

I have the very same use case: PDFs readable for everyone should be accepted, while PDFs which require a password in order to read them should be rejected. So, I'd love to have a HummusJS feature displaying the PDF permissions in detail. Meanwhile, I've found a hack: I simply check if the number of pages is greater than 0. If yes, it's obviously readable despite being encrypted.

const pdfReader = hummus.createReader(new hummus.PDFRStreamForBuffer(bufferPdf));

var pages = pdfReader.getPagesCount();

if(pages > 0) {
          resolve(true);
}
else {
          if(pdfReader.isEncrypted()) {
            reject("Input file is encrypted. Rejected!");
            return;
          }

          else {
            reject("Unexpected behaviour. Number of pages is: '" + pages + "', but file is NOT encrypted.");
            return;
          }
}
galkahana commented 6 years ago

the underlying C++ lib does actually support more questions: https://github.com/galkahana/PDF-Writer/blob/master/PDFWriter/DecryptionHelper.h#L46

when a pdf is opened, you can tell if it was opened with an owner password - DidSucceedOwnerPasswordVerification

by providing empty/no password, you can also tell if there's a need for a password to read the file - DidFailPasswordVerification