I've noticed that when I split my PDF via Firefox to have a smaller PDF (e.g. first 10 pages), openparse wont extract any nodes. Original PDF gets extracted fine.
When I specify table_args, it will make parser return some nodes, but all are identified as a table.
I am attaching the PDF, perhaps someone could have a look what's wrong.
concept-vp4360-cz.pdf
Example Code
No response
Python, open-parse & OS Version
python_version: 3.12.7
operating_system: Linux
os_version: 6.11.8-arch1-2
open-parse version: 0.7.0
python version: 3.12.7 (main, Oct 1 2024, 11:15:50) [GCC 14.2.1 20240910]
platform: Linux-6.11.8-arch1-2-x86_64-with-glibc2.40
related packages: torchvision-0.20.1 tokenizers-0.20.3 torch-2.5.1 pydantic-2.9.2 PyMuPDF-1.24.13 transformers-4.46.2
Initial Checks
Description
I've noticed that when I split my PDF via Firefox to have a smaller PDF (e.g. first 10 pages), openparse wont extract any nodes. Original PDF gets extracted fine.
When I specify table_args, it will make parser return some nodes, but all are identified as a table.
I am attaching the PDF, perhaps someone could have a look what's wrong. concept-vp4360-cz.pdf
Example Code
No response
Python, open-parse & OS Version