cantoo-scribe / pdf-lib

Create and modify PDF documents in any JavaScript environment
https://pdf-lib.js.org
MIT License
136 stars 27 forks source link

Implement encryption #54

Closed programmarchy closed 4 months ago

programmarchy commented 5 months ago

Based on PhakornKiong/pdf-lib#dev/DocEncrypt

What?

This PR adds the ability to encrypt a PDF document. It's based on PhakornKiong's PR https://github.com/Hopding/pdf-lib/pull/1015, which sadly did not get merged. I've updated his code to be compatible with the latest changes to pdf-lib. Tried to simplify where possible, minimizing changes to PDFDocument and PDFContext.

Why?

I needed PDF encryption on a project. This repo has already merged decryption. Let's have both!

How?

Uses CryptoJS to encrypt streams.

Testing?

I've tested it for my personal use case. I have not implemented unit tests yet. Want some assurance it will get merged before working on unit tests.

New Dependencies?

CryptoJS.

Screenshots

N/A

Suggested Reading?

Yes, also read the Encryption part of the spec.

Anything Else?

Checklist

Sharcoux commented 5 months ago

I'll have a look at it as soon as I have a moment. Thanks very much for driving this here.

Sharcoux commented 5 months ago

@programmarchy Everything seems alright. I'll be able to merge. I'm just not sure about the change from PDFHeader.forVersion(1, 7); to this.context.header. I don't understand either the original code nor the new one to be honest :sweat_smile:

programmarchy commented 5 months ago

@Sharcoux I know that the PDF version dictates which encryption algorithm (represented by V) is used.

But you ask a good question worth investigating. There may be a reason that version 1.7 was hardcoded. For example, maybe pdf-lib can read version 1.4 PDFs, but only supports writing 1.7-compatible PDFs?

If we determine that pdf-lib will only support writing 1.7 PDFs, then we could eliminate the code that handles encryption for other PDF versions.

Sharcoux commented 5 months ago

Do you have time to run those investigations? Otherwise, we should take the safest approach, which is probably to keep the original code. If that raises problems, we can solve them as they go.

programmarchy commented 5 months ago

@Sharcoux Agree on taking the safer approach. I'm good only supporting 1.7. So I'll simplify PDFSecurity to only support that version.

Also, this PR has a bug where encrypted PDFs cannot be opened Adobe Acrobat. It's stricter than other readers, so I need to fix that also.

Sharcoux commented 5 months ago

@Sharcoux Agree on taking the safer approach. I'm good only supporting 1.7. So I'll simplify PDFSecurity to only support that version.

Also, this PR has a bug where encrypted PDFs cannot be opened Adobe Acrobat. It's stricter than other readers, so I need to fix that also.

Maybe you should keep the support for other versions and only add a comment above the part: const header = PDFHeader.forVersion(1, 7); mentioning that investigations are required to know why this is there. This way, we don't lose your work.

RippleRurigaki commented 5 months ago

@programmarchy

But you ask a good question worth investigating. There may be a reason that version 1.7 was hardcoded. For example, maybe pdf-lib can read version 1.4 PDFs, but only supports writing 1.7-compatible PDFs?

If we determine that pdf-lib will only support writing 1.7 PDFs, then we could eliminate the code that handles encryption for other PDF versions.

pdf-lib does not seem to assume anything other than version 1.7. I can't find any version checks for added functionality or deprecated functionality by PDF version. Maybe that's why version 1.7 is hardcoded because of the simplification.

Add:

I found the comment mentioned by @Hopding

https://github.com/Hopding/pdf-lib/issues/916#issuecomment-925402016

Sharcoux commented 4 months ago

@programmarchy

But you ask a good question worth investigating. There may be a reason that version 1.7 was hardcoded. For example, maybe pdf-lib can read version 1.4 PDFs, but only supports writing 1.7-compatible PDFs? If we determine that pdf-lib will only support writing 1.7 PDFs, then we could eliminate the code that handles encryption for other PDF versions.

pdf-lib does not seem to assume anything other than version 1.7. I can't find any version checks for added functionality or deprecated functionality by PDF version. Maybe that's why version 1.7 is hardcoded because of the simplification.

Add:

I found the comment mentioned by @Hopding

Hopding#916 (comment)

Nice, but then, this means that we should probably keep things the way they were. Can you revert that part and add a comment near the hard codded header linking to Hopding message or this conversation?

programmarchy commented 4 months ago

I reverted the version header part.

However, there's still the outstanding issue with encrypted PDFs not able to be read by Adobe Acrobat. Still looking into that. Acrobat doesn't seem to output any useful information for troubleshooting. Going to compare with other PDF libs.

Sharcoux commented 4 months ago

Ah, my bad, I saw your comment too late. I merged but I won't release waiting for your fix.

Sharcoux commented 4 months ago

@programmarchy Will you still try to make the pdfs be opened with Adobe Acrobet?

programmarchy commented 4 months ago

@Sharcoux It's on my todo list but I don't have a timeline of when I can finish it. Partly because of my schedule, and partly because I'm a bit stuck, frankly.

I know that qpdf is able to encrypt the same PDFs I tested with and they open successfully. Their encryption code is here: https://github.com/qpdf/qpdf/blob/main/libqpdf/QPDF_encryption.cc but I haven't finished comparing the implementations.

If we can attract others to help solve the problem I think that'd be ideal.

programmarchy commented 4 months ago

@PhakornKiong Hope it's okay that I am reaching out to you here. We're working on merging in your PDF encryption implementation into this fork of pdf-lib. The implementation works great for most PDF viewers, however, Adobe Acrobat cannot open the file.

Do you happen to have any insight?

image

RippleRurigaki commented 4 months ago

@programmarchy

Try adding types that do not compress in the object stream.

PDFStreamWriter.ts

      const shouldNotCompress =
        ref === this.context.trailerInfo.Encrypt ||
        object instanceof PDFStream ||
        object instanceof PDFInvalidObject ||
        object instanceof PDFCatalog ||   //Add
        object instanceof PDFPageLeaf ||  //Add
        object instanceof PDFPageTree || //Add
        ref.generationNumber !== 0;

and Import.

Because,In PDF Specifications

In linearized files (see Annex F, "Linearized PDF" and Annex G, "Linearized PDF access strategies"), the document catalog dictionary, the linearization dictionary, and page objects shall not appear in an object stream.

Only a simple test, but the error seems to have disappeared.

RippleRurigaki commented 4 months ago

Ah?

https://github.com/cantoo-scribe/pdf-lib/pull/54#issuecomment-2186396859

Perhaps this should be a separate issue from encryption? There should be an impact on Object Stream output regardless of encryption.

programmarchy commented 4 months ago

@RippleRurigaki Yep, that fixed the issue with Acrobat! Nice find... I'll open a new PR.

Sharcoux commented 4 months ago

Released in version 2.2.0