GlenKPeterson / PdfLayoutManager

Adds line-breaking, page-breaking, tables, and styles to PDFBox
45 stars 20 forks source link

Changes to support Protection and Attachments #20

Open bshouse opened 7 years ago

bshouse commented 7 years ago

I attempted to make minimal changes to ensure users have the ability to attach files and protect the PDF as needed. I also wrote up a simple test that should write-protect the document and include the sample PNG as an attachment.

GlenKPeterson commented 7 years ago

Thank you! I just tried it out (cloned, built, and opened) and I have a few questions:

If I'm confused, I assume someone else using this will be confused, so I'd like to clear these up before merging. Thanks again for working on this!

bshouse commented 7 years ago

I run linux and Evince too and have the same experience. It doesn't show the attachment or protection. However, the Adobe PDF reader on Windows shows both. The password to the document is "ownerPassword" and is in the test case. The user password is blank. PDFBox uses BouncyCastle to encrypt the PDF. If you remove it and try to run the test, it will fail saying the bouncy castle is not found.

bshouse commented 7 years ago

I made the requested change and also fixed the tabbing in pom.xml. I noticed if I use Firefox to view the PDF, the text is not visible, but the attachment is. When you load it in Firefox there is a square in the upper left hand corner with two embedded rectangles. Clicking on it reveals a paper-clip icon to the left of it. Clicking on this will reveal the attachment. I have still not found a way to reveal the write-protection on Linux other than the inability to open it with OpenOffice without a password.

bshouse commented 7 years ago

Okay, I found some other Linux tools that come as part of the Poppler package-

For Encryption:

$ pdfinfo -struct testProtectAttach.pdf
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          1
Encrypted:      yes (print:yes copy:yes change:no addNotes:yes algorithm:RC4)
Page size:      612 x 792 pts (letter)
Page rot:       90
File size:      40435 bytes
Optimized:      no
PDF version:    1.4

For Attachments:

$ pdfdetach -list testProtectAttach.pdf 
1 embedded files
1: sampleScreenShot.png
GlenKPeterson commented 7 years ago

Thank you for continuing to work on this.

  1. Firefox thinks melon.jpg is a text file. Have you set the attachment type (or mime type or whatever) correctly?

  2. Adobe Acrobat (or creative suite PDF writer or whatever it's called now) is kind of like the reference implementation for PDFs. Do you have access to that software? Can you (or a graphic artist) manually create the same/desired file using Adobe Writer, then see how that compares to what we're producing here? If it behaves the same way in all the software we're using to test, that would suggest that we've created a "correct" PDF and that the other software was at fault. Ideally, you'd create it with the same text, same font (Helvetica is built into the PDF spec), same image, same password, same protection, but even something similar would give me a better idea of how "close to spec" our resulting PDF file was.

  3. I'm still unclear on whether the whole document is supposed to be encrypted, or just the attachment, or just the password. Also, is the encryption supposed to provide read protection (in which case it should encrypt the whole file)? Or just Write protection (in which case it probably only encrypts the password).

bshouse commented 7 years ago
  1. Hmm. Strange. There is no Melon.jpg anywhere in the PDF. The test attaches sampleScreenShot.png and does set the appropriate mime-type.

  2. I am not a PDF expert, just a hack that made it work. I will hold no ill will if you want to drop the request. It would just save me time when you publish and update. The only software I have access to is the Adobe PDF reader. The resulting PDF from the test met expectations of holding attachments and being protected against writing.

  3. Again, not a PDF expert, but my understanding follows. PDF offers a few levels of protection. Owner protections allow the creator to control the following permissions using the ProtectionPolicy:

    Encrypted:      yes (print:yes copy:yes change:no addNotes:yes algorithm:RC4)

    A user password is used to limit who can open the PDF (only those with the password).

I am sorry I don't have more expertise in the area and I understand your concerns with a stranger asking for changes. I too have limited understanding and just did what I needed to meet requirements. Again, please feel free to drop the pull request if you feel it introduces too much risk. I surely don't want to lead you down a messy path that would lead to the extinction of this project.

Thanks again for the good work.