MatthiasValvekens / pyHanko

pyHanko: sign and stamp PDF files
MIT License
511 stars 74 forks source link

pyHanko adding itself as PDF producer #223

Closed joseavegaa closed 1 year ago

joseavegaa commented 1 year ago

Describe the bug pyHanko is adding itself as a PDF producer, when there is no option to enable/disable that behavior.

To Reproduce Sign any PDF file using pyHanko, and check the advanced properties on the file.

Expected behavior Have an option to enable/disable pyHanko adding itself to the PDF document metadata, and leaving everything else as-is.

Screenshots Screenshot_20230220_023856

Environment (please complete the following information):

Additional context N/A

Thanks for the help with this issue, and for the amazing work with pyHanko, which is incredibly useful.

MatthiasValvekens commented 1 year ago

Hi @joseavegaa,

You can currently disable all metadata updates by overriding the _update_meta method on the writer (either by monkeypatching it or by creating your own subclass). In fact, some tests already do this by setting w._update_meta = lambda: None. Be aware that this is technically an internal piece of API, though.

I'm genuinely curious as to why you want to disable metadata updates as a matter of course, though. Updating the metadata (incl. the producer string) is really quite standard practice; most PDF tools---including document signing tools---that I'm aware of do the same thing. As such, I wouldn't consider this a bug, since the current behaviour is very much intentional.

Care to enlighten me? :)

joseavegaa commented 1 year ago

Hi! Thanks for the fast answer, I'll let you know how it goes.

Basically, we're working with historic documentation that cannot be altered in any way, but with the digital signature. Any modification on the metadata or on the image of the PDF, will "invalidate" the document, from a legal standpoint.

I know that is not the normal case, but my country isn't exactly the most technologically advanced.

If you have any other question or would like more information, please send me a private message and I'll be glad to answer the questions you might have.

Thanks again for the help :)

MatthiasValvekens commented 1 year ago

I see. Well, I'm not sure if it matters, but due to the way PDF signing works, the signed payload is actually always an altered version of the document: the signature has to be computed after adding a signature form field, allocating a signature container, etc. In some technical sense, this will always violate a "no changes whatsoever" kind of policy. That's also part of the reason why updating metadata is common practice, FWIW.

Some unsolicited advice: perhaps a CAdES-style signature computed over the original PDF document (and stored in a separate file) is a better fit for your use case? Or do you absolutely have to use PDF signatures?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions!