MatthiasValvekens / pyHanko

pyHanko: sign and stamp PDF files
MIT License
460 stars 68 forks source link

ValueError("Invalid padding bytes.") when trying to decrypt Adobe.PubSec encrypted pdf file #412

Closed redfast00 closed 3 months ago

redfast00 commented 3 months ago

Describe the bug

Crash when trying to decrypt Adobe.PubSec encrypted file

To Reproduce

I'm afraid this will be very hard to reproduce, since I can't share the files used to reproduce this. This is the output:

(venv) $ pyhanko --verbose decrypt pkcs12 --force encrypted.pdf decrypted.pdf key.p12
2024-03-25 10:25:57,582 - root - DEBUG - Running with --verbose
2024-03-25 10:25:57,582 - root - DEBUG - There was no configuration to parse.
Key passphrase: 
2024-03-25 10:26:03,349 - cli - ERROR - Generic processing error.
Traceback (most recent call last):
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 772, in _import_object
    return reference_map[obj.reference]
KeyError: Reference(idnum=26, generation=0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 772, in _import_object
    return reference_map[obj.reference]
KeyError: Reference(idnum=27, generation=0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 772, in _import_object
    return reference_map[obj.reference]
KeyError: Reference(idnum=34, generation=0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 772, in _import_object
    return reference_map[obj.reference]
KeyError: Reference(idnum=168, generation=0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 772, in _import_object
    return reference_map[obj.reference]
KeyError: Reference(idnum=296, generation=0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 772, in _import_object
    return reference_map[obj.reference]
KeyError: Reference(idnum=300, generation=0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/cli/runtime.py", line 50, in pyhanko_exception_manager
    yield
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/cli/commands/crypt.py", line 187, in _decrypt_pubkey
    w = copy_into_new_writer(r)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 1262, in copy_into_new_writer
    new_root_dict = w._import_object(
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 835, in _import_object
    return generic.ArrayObject(
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 836, in <genexpr>
    self._import_object(v, reference_map, obj_stream) for v in obj
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 835, in _import_object
    return generic.ArrayObject(
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 836, in <genexpr>
    self._import_object(v, reference_map, obj_stream) for v in obj
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 835, in _import_object
    return generic.ArrayObject(
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 836, in <genexpr>
    self._import_object(v, reference_map, obj_stream) for v in obj
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 769, in _import_object
    obj = obj.decrypted
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/generic.py", line 2067, in decrypted
    decrypted = pdf_string(cf.decrypt(local_key, obj.original_bytes))
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/crypt/filter_mixins.py", line 134, in decrypt
    return aes_cbc_decrypt(key, data, iv)
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/pyhanko/pdf_utils/crypt/_util.py", line 20, in aes_cbc_decrypt
    return unpadder.update(plaintext) + unpadder.finalize()
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/cryptography/hazmat/primitives/padding.py", line 160, in finalize
    result = _byte_unpadding_check(
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/cryptography/hazmat/primitives/padding.py", line 97, in _byte_unpadding_check
    raise ValueError("Invalid padding bytes.")
ValueError: Invalid padding bytes.
Error: Generic processing error.

Expected behavior

The document decrypts.

Environment (please complete the following information):

$ pip3 freeze
asn1crypto==1.5.1
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
cryptography==42.0.5
idna==3.6
oscrypto==1.3.0
pycparser==2.21
pyHanko==0.23.2
pyhanko-certvalidator==0.26.3
pypng==0.20220715.0
PyYAML==6.0.1
qrcode==7.4.2
requests==2.31.0
typing_extensions==4.10.0
tzlocal==5.2
uritools==4.0.2
urllib3==2.2.1

Python 3.10.12

Additional context

$ openssl pkcs12 -legacy -info -in key.p12 -noout                                                     
Enter Import Password:
MAC: sha1, Iteration 100000
MAC length: 20, salt length: 20
PKCS7 Data
Shrouded Keybag: pbeWithSHA1And3-KeyTripleDES-CBC, Iteration 50000
PKCS7 Encrypted data: pbeWithSHA1And40BitRC2-CBC, Iteration 50000
Certificate bag
Certificate bag

I've had problems with the 40 bit RC2 in the past, so I upgraded the key by following https://www.docuseal.co/docs/convert-legacy-p12-pfx-files-to-support-openssl-3, but I still have the same problem.

$ openssl pkcs12 -legacy -info -in key_new.p12 -noout
Enter Import Password:
MAC: sha256, Iteration 2048
MAC length: 32, salt length: 8
PKCS7 Encrypted data: PBES2, PBKDF2, AES-256-CBC, Iteration 2048, PRF hmacWithSHA256
Certificate bag
Certificate bag
PKCS7 Data
Shrouded Keybag: PBES2, PBKDF2, AES-256-CBC, Iteration 2048, PRF hmacWithSHA256
redfast00 commented 3 months ago

I can confirm that I can decrypt the document with the code in https://github.com/py-pdf/pypdf/pull/2537

MatthiasValvekens commented 3 months ago

Interesting... can you try with the previous release? 0.23.1

I can't test myself right now, unfortunately, but I'll take a look later.

redfast00 commented 3 months ago

Yes, with pip install 'pyHanko[pkcs11,image-support,opentype,xmp]==0.23.1', I get the same error.

Additionally, while trying to decrypt another document (that I also succesfully decrypted before with my patch to pypdf), I got:

  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 780, in _import_object
    imported = self._import_object(refd, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 793, in _import_object
    raw_dict = {
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 794, in <dictcomp>
    k: self._import_object(v, reference_map, obj_stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/writer.py", line 774, in _import_object
    refd = obj.get_object()
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/generic.py", line 529, in get_object
    obj = self.reference.get_object()
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/generic.py", line 204, in get_object
    return self.pdf.get_object(self).get_object()
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/reader.py", line 425, in get_object
    obj = self._read_object(
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/reader.py", line 478, in _read_object
    retval = self._get_object_from_stream(
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/reader.py", line 291, in _get_object_from_stream
    obj = generic.read_object(
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/generic.py", line 246, in read_object
    result = DictionaryObject.read_from_stream(
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/generic.py", line 1273, in read_from_stream
    key = read_object(stream, container_ref)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/generic.py", line 240, in read_object
    result = NameObject.read_from_stream(stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/generic.py", line 1124, in read_from_stream
    name_bytes = read_until_delimiter(stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/misc.py", line 108, in read_until_delimiter
    result = _read_until_class(PDF_WHITESPACE + PDF_DELIMITERS, stream)
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/misc.py", line 125, in _read_until_class
    return b''.join(_build())
  File "/home/user/Documents/secure/venv-oldpyhanko/lib/python3.10/site-packages/pyhanko/pdf_utils/misc.py", line 120, in _build
    tok = stream.read(1)
RecursionError: maximum recursion depth exceeded while calling a Python object
Error: Generic processing error.

The full stacktrace is attached: stacktrace.txt

MatthiasValvekens commented 3 months ago

Hi @redfast00,

Those are all pretty interesting... There seems to be some kind of regression in 0.23.1 -> 0.23.2 regarding permission flags in files using the PubSec handler, but so far that appears to be unrelated to what you're seeing.

I see _get_object_from_stream appear in both stack traces, so I suspect this to be a funny interaction between object streams (AKA "compressed objects") and file encryption. I'll investigate further along those lines...

redfast00 commented 3 months ago

Thank you for looking at it; I understand that this can be frustrating if you don't have the files, but I unfortunately really can't share them. If you need me to try some things out, feel free to ask :smile:

MatthiasValvekens commented 3 months ago

Hmmm, I tried quite a few permutations but could not reproduce what you're seeing. Can you tell me which software (purportedly) created your PDFs? And can you post the content of the /Encrypt dictionary?

For the first document, can you try breakpointing the failure and check if (a) pyHanko is able to decrypt any strings / streams at all in that document, and (b) if it fails only on a specific one, whether that string "looks" encrypted? I'm starting to think that perhaps there's a string somewhere that has not been encrypted by mistake (I can think of some scenarios that could result in that kind of thing).

In the second issue, it appears that the public-key encryption part is working as it should: the bottom of the call stack seems to indicate that we're successfully looking inside an object stream somewhere, which requires said object stream to have been decrypted without issues. Rather, from the very high object ids in your stack trace, I think there's just a very deep path in the tree somewhere that is blowing up the stack (a particularly convoluted structure tree could do that, I suppose...). I didn't spot any loops in the stack trace (which _import_object defends against, FWIW). If the issue is genuinely due to the object graph just being too big, the fix would be to rewrite _import_object to not use recursion, which is of course possible but it requires some refactoring.

redfast00 commented 3 months ago

PDF metadata

In the PDF metadata, there is:

For the small document (4 pages) that crashes with the value-error:

'/Creator': 'Acrobat PDFMaker 9.1 for Word',
'/Producer': 'Adobe PDF Library 9.0',

For the big document, that crashes with the recursion error:

{'/Creator': 'FrameMaker 16.0.4',
 '/Producer': 'Acrobat Distiller 23.0 (Windows)'}

(identifying information removed)

Can it decrypt any strings?

By breakpointing, I was able to verify that the shared key calculates to the same that the working patch to pypdf does. It is indeed able to decrypt strings: most of the plaintexts look like hex garbage, but there is a pattern in them (they start with the same character); and some plaintexts contain readable metadata.

The place where it fails in aes_cbc_decrypt, data is b''; resulting in plaintext also being b'', and there likely not being any more padding. I fixed this by adding a check for zero-length encrypted data (it will then just return b'', just like pypdf does), and then indeed saw that it was trying to decrypt something that looks like a certificate. I added a check to return this as-is instead of decrypting; the new error I get is:

2024-03-26 10:48:33,584 - root - DEBUG - Running with --verbose
2024-03-26 10:48:33,584 - root - DEBUG - There was no configuration to parse.
Key passphrase: 
2024-03-26 10:48:39,451 - tzlocal - DEBUG - /etc/timezone found, contents:
 Europe/Brussels

2024-03-26 10:48:39,451 - tzlocal - DEBUG - /etc/localtime found
2024-03-26 10:48:39,452 - tzlocal - DEBUG - 2 found:
 {'/etc/timezone': 'Europe/Brussels', '/etc/localtime is a symlink to': 'Europe/Brussels'}
2024-03-26 10:48:39,454 - cli - ERROR - Generic processing error.
Traceback (most recent call last):
  File "/home/user/Documents/secure/pyHanko/pyhanko/cli/runtime.py", line 50, in pyhanko_exception_manager
    yield
  File "/home/user/Documents/secure/pyHanko/pyhanko/cli/commands/crypt.py", line 189, in _decrypt_pubkey
    w.write(outf)
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/writer.py", line 608, in write
    self._write(stream)
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/writer.py", line 624, in _write
    self._write_objects(stream, object_positions)
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/writer.py", line 512, in _write_objects
    obj.write_to_stream(stream, handler, container_ref)
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/generic.py", line 1249, in write_to_stream
    value.write_to_stream(stream, handler, container_ref)
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/generic.py", line 487, in write_to_stream
    data.write_to_stream(
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/generic.py", line 1249, in write_to_stream
    value.write_to_stream(stream, handler, container_ref)
AttributeError: 'Reference' object has no attribute 'write_to_stream'
Error: Generic processing error.

Second document

This document is indeed pretty large (334 pages, 16M), so it's possible that the object graph is indeed too big.

Content of /Encrypt dictionary

This is identifying information, I'm afraid.

MatthiasValvekens commented 3 months ago

Thanks! That helps a lot, I'm now pretty confident that the issues you're seeing are not related to public-key encryption as such (that seems to work), it's got more to do with decrypting the actual objects in the file(s).

The place where it fails in aes_cbc_decrypt, data is b''; resulting in plaintext also being b'', and there likely not being any more padding.

Hmm, this sounds like a bug in the software that produced the document, to be honest. For disambiguation purposes, PKCS#7 padding requires adding a full block of padding bytes even when the message length is an integer multiple of the block size (and AFAIK an empty message is not an exception to this rule). But OK, that's something we can try to tolerate.

AttributeError: 'Reference' object has no attribute 'write_to_stream'

This error is quite curious. Out of curiosity: where did you put that workaround to check for empty strings?

This document is indeed pretty large (334 pages, 16M), so it's possible that the object graph is indeed too big.

Mmkay, I'll look into rewriting _import_object to not work recursively, then.

This is identifying information, I'm afraid.

Fair enough; I think we've ruled out the Encrypt dictionary as a source of problems anyway :).

By the way: are these the only two public-key encrypted documents you have tried, or are there others?

MatthiasValvekens commented 3 months ago

Hi @redfast00,

I think I managed to hunt down all the issues you saw: the object graph thing, the padding issue, and also the AttributeError: 'Reference' object has no attribute 'write_to_stream' error you encountered (coincidentally, I stumbled upon that one while fixing the object importing logic).

Can you try again with the code in #414?

redfast00 commented 3 months ago

Out of curiosity: where did you put that workaround to check for empty strings?

In aes_cbc_decrypt; but I just checked if len(data) == 0: return data. That's also where I put the workaround for the certificate-looking data that got decrypted but shouldn't be (I think).

By the way: are these the only two public-key encrypted documents you have tried, or are there others?

Correct, these are the only two I've tried in pyHanko.

Can you try again with the code in https://github.com/MatthiasValvekens/pyHanko/pull/414?

Yes! I'll do that tomorrow; thanks for trying to fix this :+1:

redfast00 commented 3 months ago

@MatthiasValvekens good news and bad news!

With the code from your PR, I still get:

Traceback (most recent call last):
  File "/home/user/Documents/secure/pyHanko/pyhanko/cli/runtime.py", line 50, in pyhanko_exception_manager
    yield
  File "/home/user/Documents/secure/pyHanko/pyhanko/cli/commands/crypt.py", line 187, in _decrypt_pubkey
    w = copy_into_new_writer(r)
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/writer.py", line 1293, in copy_into_new_writer
    new_root_dict = importer.import_object(input_handler.root)
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/writer.py", line 1173, in import_object
    imported = self._ingest(source_obj)
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/writer.py", line 1195, in _ingest
    raw_dict = {
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/writer.py", line 1196, in <dictcomp>
    k: self._ingest(v) for k, v in obj.items() if k != '/Metadata'
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/writer.py", line 1191, in _ingest
    obj = obj.decrypted
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/generic.py", line 2067, in decrypted
    decrypted = pdf_string(cf.decrypt(local_key, obj.original_bytes))
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/crypt/filter_mixins.py", line 134, in decrypt
    return aes_cbc_decrypt(key, data, iv)
  File "/home/user/Documents/secure/pyHanko/pyhanko/pdf_utils/crypt/_util.py", line 16, in aes_cbc_decrypt
    plaintext = decryptor.update(data) + decryptor.finalize()
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/cryptography/hazmat/primitives/ciphers/base.py", line 184, in finalize
    data = self._ctx.finalize()
  File "/home/user/Documents/secure/venv/lib/python3.10/site-packages/cryptography/hazmat/backends/openssl/ciphers.py", line 223, in finalize
    raise ValueError(
ValueError: The length of the provided data is not a multiple of the block length.

but I already got this error before; if I change the AES decrypt function to:

def aes_cbc_decrypt(key, data, iv, use_padding=True):
    if len(data) == 8636:
        print("WARNING: special case for somehow not encrypted data")
        print(data)
        return data
    cipher = Cipher(algorithms.AES(key), modes.CBC(iv))
    decryptor = cipher.decryptor()
    plaintext = decryptor.update(data) + decryptor.finalize()

    # we tolerate empty messages that don't have padding
    if use_padding and len(plaintext) > 0:
        unpadder = padding.PKCS7(128).unpadder()
        return unpadder.update(plaintext) + unpadder.finalize()
    else:
        return plaintext

then the small PDF decrypts fine, and seems to look okay :tada:

I also tested the large PDF with the same code, this also decrypts now, and also hits the same 'somehow not encrypted data' path as the small PDF (with the exact same length).

If it is useful to you, I think the 'somehow not encrypted data' is a certificate, padded with null-bytes. It is clearly plaintext, since I can see the string This certificate has been issued in accordance with the GlobalSign CPS located at https://www.globalsign.com in the data. pypdf with my patch does not error on this, so I think it should be possible to somehow see the difference between data that needs to be decrypted and data that doesn't (with a better heuristic than just looking at the ciphertext length and seeing if it is a multiple of the block length).

MatthiasValvekens commented 3 months ago

Hi @redfast00,

Aha, I suspect that your input files contain digital signatures... :D. Does the object reside in a key named /Contents?

That would also immediately explain a lot of things:

Anyway, if that's what's going on, then the mystery as to why you're hitting all those wonky corner cases is also solved ;).

I can probably make the importer aware of DigSig data in the input file and make it deal with those things gracefully (but the signatures will be not be valid for the decrypted output either way, so be mindful of that).

redfast00 commented 3 months ago

@MatthiasValvekens the input file indeed contains a digital signature: it is signed + encrypted.

I don't know what key this object resides in; I couldn't immediately find a way to get the key from the AES decryption function.

but the signatures will be not be valid for the decrypted output either way, so be mindful of that

I really don't mind, I'd already be very happy to be able to read my document in an open-source PDF reader (instead of having to boot Windows just to use Adobe Reader).

As feedback from a user, for me it was not immediately clear from the README that pyHanko could be used to decrypt PDF documents; maybe this is useful to add to the README/documentation?

MatthiasValvekens commented 3 months ago

I can probably make the importer aware of DigSig data in the input file and make it deal with those things gracefully (but the signatures will be not be valid for the decrypted output either way, so be mindful of that).

I've implemented, tested and merged this (see #414). Can you give it a try on your files? If that works, then we can close this issue as well.

As feedback from a user, for me it was not immediately clear from the README that pyHanko could be used to decrypt PDF documents; maybe this is useful to add to the README/documentation?

Right, the README mentions encryption support but does not explicitly call out encryption/decryption of files as a feature. That's indeed worth some clarification. Thanks!

redfast00 commented 3 months ago

Werkt perfect, bedankt om dit op te lossen!