Closed PhilterPaper closed 1 month ago
...and if all you're doing is creating an image of the first page, it might be even easier to open the PDF in GIMP (Create > Open Webpage, crop to size, scale) and save as a JPEG or PNG image. You might be able to script it to do it for you (GIMP's Python-like scripting language), if you plan to do a lot of these.
In ssimms/pdfapi2/issues/78, @neffets reported that an attempt to read in a PDF (into PDF::API2) appears to create an infinite loop and ever-growing memory usage:
PDF::API2->from_string causes for action "page1-to-thumbnail" an OOM (memory-leak, loop)
We have normal pdf created with "Acrobat PDFMaker for Word". It has only 6 pages.
We try to generate a Thumbnail from the PDF per PDF::API2
It hangs on line 1 (first from_string()) forever, increasing the use resident-memory over time (using 1GB per 300 seconds more)
Workaround is to encapsulate the "from_string" method with an POSIX::sigaction and alarm(10). 2024Q1-digitale-Veranstaltungen_de.pdf tixA78.pl.txt
========================================================================== I would not be surprised that this doesn't work. The header claims to be PDF-1.6, which is beyond what PDF::Builder or PDF::API2 supports, and I see that the first object is an "object stream", which I know is not supported by either library. That alone could well be killing it.
Is there any chance of producing the original PDF at level 1.4? If you can, it would be interesting to see if it works then.
By the way, this seems a rather convoluted way to extract a page and put it into another PDF (or whatever you do with the page image). Consider
PDF::API2->open(PDF file)
instead of reading it into a file and then doing a from_string().