Davtax / arXiv-sorter

Sort the daily arXiv mail list by user keyworks, and output the manuscripts in a nice markdown.
MIT License
3 stars 0 forks source link

Bug with pymupdf #44

Open Davtax opened 1 week ago

Davtax commented 1 week ago

When cropping the manuscripts of 2024-11-15, for the categories: cond-mat and quant-ph, the following error appears

File "run.py", line 4, in <module>
  File "app/main.py", line 157, in main
  File "app/pdf_scrapper.py", line 173, in get_images_pdf_scrapper
  File "app/pdf_scrapper.py", line 93, in extract_from_json
  File "app/pdf_scrapper.py", line 72, in _extract_region
  File "pymupdf/__init__.py", line 9722, in set_cropbox
  File "pymupdf/__init__.py", line 8300, in _set_pagebox
ValueError: CropBox not in MediaBox
[PYI-59851:ERROR] Failed to execute script 'run' due to unhandled exception!
Davtax commented 1 week ago

The problem comes from PDFFIGURES2, since sometimes it detects figures out of the boundaries of the pages. For the moment, I have avoided this problem by catching the error and not showing any image #45.

Maybe I have to search for other forks of pdffigures2, that solve this problem.