MatthiasValvekens / pyHanko

pyHanko: sign and stamp PDF files
MIT License
460 stars 68 forks source link

ValueError: invalid literal for int() with base 10: '' while signing file #405

Open fers490 opened 4 months ago

fers490 commented 4 months ago

Describe the bug I'm getting the following error when trying to sign a document:

File ".venv/lib/python3.11/site-packages/pyhanko/sign/signers/functions.py", line 150, in async_sign_pdf
    return await pdf_signer.async_sign_pdf(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/sign/signers/pdf_signer.py", line 1505, in async_sign_pdf
    signing_session = self.init_signing_session(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/sign/signers/pdf_signer.py", line 1186, in init_signing_session
    sig_field_ref = next(cms_writer)
                    ^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/sign/signers/cms_embedder.py", line 450, in write_cms
    field_created, sig_field_ref = _get_or_create_sigfield(
                                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/sign/signers/cms_embedder.py", line 125, in _get_or_create_sigfield
    field_created, sig_field_ref = prepare_sig_field(
                                   ^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/sign/fields.py", line 1483, in prepare_sig_field
    field_name, value, sig_field_ref = next(candidates)
                                       ^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/sign/fields.py", line 1637, in enumerate_sig_fields_in
    field = field_ref.get_object()
            ^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/generic.py", line 529, in get_object
    obj = self.reference.get_object()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/generic.py", line 204, in get_object
    return self.pdf.get_object(self).get_object()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/reader.py", line 425, in get_object
    obj = self._read_object(
          ^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/reader.py", line 493, in _read_object
    retval = generic.read_object(
             ^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/generic.py", line 246, in read_object
    result = DictionaryObject.read_from_stream(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/generic.py", line 1279, in read_from_stream
    value = read_object(stream, container_ref)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/generic.py", line 246, in read_object
    result = DictionaryObject.read_from_stream(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/generic.py", line 1279, in read_from_stream
    value = read_object(stream, container_ref)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/generic.py", line 246, in read_object
    result = DictionaryObject.read_from_stream(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/generic.py", line 1276, in read_from_stream
    key = read_object(stream, container_ref)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/generic.py", line 281, in read_object
    result = NumberObject.read_from_stream(stream)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/generic.py", line 697, in read_from_stream
    return NumberObject(num.decode('ascii'))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pyhanko/pdf_utils/generic.py", line 668, in __new__
    val = int(value)
          ^^^^^^^^^^
ValueError: invalid literal for int() with base 10: ''

To Reproduce After some research, I've found that the problematic file contains a malformed dictionary containing a value without a key. This is the conflicting field object in the file:

obj\n<</AP<</D<</2958 0 R/Off 2959 0 R>>/N<</2956 0 R/Off 2957 0 R>>>>/AS/Off/BS<</S/I/W 1>>/DA(/ZaDb 6 Tf 0 g)/F 6/FT/Btn/MK<</CA(4)>>/P 2773 0 R/Rect[20.4649 4.80179 31.293 16.2668]/StructParent 1221/Subtype/Widget/T(page_exists_T5)/TU(page 5 exists)/Type/Annot>>\nendobj

The dictionary after /D (<</2958 0 R/Off 2959 0 R>>) seems to use a reference as the value of the first key, but the key itself is missing, leading to the parsing error.

Expected behavior The malformed file works correctly in document readers and I assume the wrong field is not necessary for signing purposes, so it might be possible to skip it.

Environment (please complete the following information):