chinapandaman / PyPDFForm

:fire: The Python library for PDF forms.
https://chinapandaman.github.io/PyPDFForm/
MIT License
381 stars 15 forks source link

Error when dealing with integer / radio button #626

Open MattMuffin opened 4 months ago

MattMuffin commented 4 months ago

The error appears both when using PdfWrapper and FromWrapper, but with a different behavior.

Here is the test pdf file: https://www.cdc.gov/infectioncontrol/pdf/icar/IPC-demo-LTC-508.pdf

Loading the file shows this in the console:

Ignoring wrong pointing object 14 0 (offset 0)
Ignoring wrong pointing object 89 0 (offset 0)

EDIT: this could be the issue: https://github.com/py-pdf/pypdf/issues/2326

Calling the schema method:

pdf_form_schema = PdfWrapper("IPC-demo-LTC-508.pdf").schema

shows that the field "S1 GF 7" is of type "integer" (a radio button), with a maximum value of 2.

Problem 1: why is the maximum 2 when there are clearly 4 possible options? Starting with 0 = "Accute", 1 = "Long-term" , 2 = "Outpatient", but setting 3 = "Other" does nothing.

Problem 2: using FormWrapper setting the field "S1 GF 7" to any integer fails with the following message:

Traceback (most recent call last):
  File "~/.local/share/virtualenvs/test_auto_fill-9T0hmLRz/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 600, in _run_script
    exec(code, module.__dict__)
  File "~/Documents/Projets/test_auto_fill/test.py", line 7, in <module>
    filled = FormWrapper("IPC-demo-LTC-508.pdf").fill(
  File "~/.local/share/virtualenvs/test_auto_fill-9T0hmLRz/lib/python3.9/site-packages/PyPDFForm/wrapper.py", line 60, in fill
    self.stream = simple_fill(
  File "~/.local/share/virtualenvs/test_auto_fill-9T0hmLRz/lib/python3.9/site-packages/PyPDFForm/filler.py", line 192, in simple_fill
    simple_update_radio_value(annot)
  File "~/.local/share/virtualenvs/test_auto_fill-9T0hmLRz/lib/python3.9/site-packages/PyPDFForm/patterns.py", line 109, in simple_update_radio_value
    for each in annot[AP][D]:  # noqa
  File "~/.local/share/virtualenvs/test_auto_fill-9T0hmLRz/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 409, in __getitem__
    return dict.__getitem__(self, key).get_object()
KeyError: '/D'

Problem 3: using FormWrapper: setting the field "S1 GF 7" to a string, like '0', always checks the option "Other".

Problem 4: using PdfWrapper: setting the field "S1 GF 7" to a string does nothing (soft fail?); setting the field "S1 GF 7" to an integer between 0 and 2 checks the appropriate field. I have not found a way to check the "Other" field option. I was expecting the same behavior using string as FormWrapper.

Environment:

Python 3.9.6 pip 23.3.2

altair==5.3.0 attrs==23.2.0 blinker==1.8.2 cachetools==5.3.3 certifi==2024.2.2 cffi==1.16.0 chardet==5.2.0 charset-normalizer==3.3.2 click==8.1.7 cryptography==42.0.7 gitdb==4.0.11 GitPython==3.1.43 idna==3.7 Jinja2==3.1.4 jsonschema==4.22.0 jsonschema-specifications==2023.12.1 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 numpy==1.26.4 packaging==24.0 pandas==2.2.2 pillow==10.3.0 protobuf==4.25.3 pyarrow==16.0.0 pycparser==2.22 pydeck==0.9.0 Pygments==2.18.0 pypdf==4.2.0 PyPDFForm==1.4.24 python-dateutil==2.9.0.post0 pytz==2024.1 referencing==0.35.1 reportlab==4.2.0 requests==2.31.0 rich==13.7.1 rpds-py==0.18.1 six==1.16.0 smmap==5.0.1 streamlit==1.34.0 tenacity==8.3.0 toml==0.10.2 toolz==0.12.1 tornado==6.4 typing_extensions==4.11.0 tzdata==2024.1 urllib3==2.2.1

chinapandaman commented 4 months ago

Hey, thanks for posting.

Problem 1: Might be an environment issue? See screenshot and this test case: Screenshot 2024-05-12 134558

Problem 2: This should now be fixed in v1.4.25. If you are interested these are the test cases.

Problem 3: Do not use string for radio buttons. Always use integers and if you do, this should also be fixed now that problem 2 is fixed.

Problem 4: Again, do not use string. I'm not sure why in your environment you are only able to select up to the third radio button though. Test case for the forth radio button.

PS: I recorded the whole process of me working through your problems if you are interested: https://www.youtube.com/watch?v=Du6CI0ndnjU

MattMuffin commented 4 months ago

Hello, thank you for the quick fix. Trying to track the issue related to problem #1. What is your environment?

chinapandaman commented 4 months ago

I did everything inside the GitHub Codespaces of this repo. CPU info: Screenshot 2024-05-13 154401

Python version: 3.10.13 also seen from the above screenshot.

Running pip freeze gives:

anyio==4.3.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
astroid==3.1.0
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
Babel==2.14.0
beautifulsoup4==4.12.3
bleach==6.1.0
certifi==2024.2.2
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.6
comm==0.2.2
contourpy==1.2.1
coverage==7.5.1
cryptography==42.0.6
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.8
exceptiongroup==1.2.0
executing==2.0.1
fastjsonschema==2.19.1
filelock==3.13.3
fonttools==4.50.0
fqdn==1.5.1
fsspec==2024.3.1
ghp-import==2.1.0
gitdb==4.0.11
GitPython==3.1.43
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
idna==3.6
iniconfig==2.0.0
ipykernel==6.29.4
ipython==8.23.0
isoduration==20.11.0
isort==5.13.2
jedi==0.19.1
Jinja2==3.1.3
joblib==1.3.2
json5==0.9.24
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter-events==0.10.0
jupyter-lsp==2.2.4
jupyter-server-mathjax==0.2.6
jupyter_client==8.6.1
jupyter_core==5.7.2
jupyter_server==2.13.0
jupyter_server_terminals==0.5.3
jupyterlab==4.1.5
jupyterlab_git==0.50.0
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.4
kiwisolver==1.4.5
Markdown==3.6
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mccabe==0.7.0
mergedeep==1.3.4
mistune==3.0.2
mkdocs==1.6.0
mkdocs-get-deps==0.2.0
mpmath==1.3.0
nbclient==0.10.0
nbconvert==7.16.3
nbdime==4.0.1
nbformat==5.10.3
nest-asyncio==1.6.0
networkx==3.2.1
notebook_shim==0.2.4
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.4.99
nvidia-nvtx-cu12==12.1.105
overrides==7.7.0
packaging==24.0
pandas==2.2.1
pandocfilters==1.5.1
parso==0.8.3
pathspec==0.12.1
pexpect==4.9.0
pillow==10.3.0
platformdirs==4.2.0
plotly==5.20.0
pluggy==1.5.0
prometheus_client==0.20.0
prompt-toolkit==3.0.43
psutil==5.9.8
ptyprocess==0.7.0
pudb==2014.1
pure-eval==0.2.2
pycparser==2.22
Pygments==2.17.2
pylint==3.1.0
pyparsing==3.1.2
pypdf==4.2.0
pytest==8.2.0
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
pytz==2024.1
PyYAML==6.0.1
pyyaml_env_tag==0.1
pyzmq==25.1.2
referencing==0.34.0
reportlab==4.2.0
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.18.0
scikit-learn==1.4.1.post1
scipy==1.13.0
seaborn==0.13.2
Send2Trash==1.8.2
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
soupsieve==2.5
stack-data==0.6.3
sympy==1.12
tenacity==8.2.3
terminado==0.18.1
threadpoolctl==3.4.0
tinycss2==1.2.1
tomli==2.0.1
tomlkit==0.12.4
torch==2.2.2
tornado==6.4
traitlets==5.14.2
triton==2.2.0
types-python-dateutil==2.9.0.20240316
typing_extensions==4.10.0
tzdata==2024.1
uri-template==1.3.0
urllib3==2.2.1
urwid==2.6.11
watchdog==4.0.0
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0

Screenshot 2024-05-13 154441

Also, the test case I mentioned earlier was run as part of of the CI on three different os: ubuntu-latest, windows-latest, and macos-latest across 5 Python versions and passed on all of them. So I'm actually very confused by what kind of environment would lead to your issue.