Belval / pdf2image

A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
MIT License
1.51k stars 187 forks source link

Any interest in new feature: Getting URLs from pdfinfo? #257

Open jpreiss opened 1 year ago

jpreiss commented 1 year ago

I am using your library to rasterize PDFs in my presentation viewer https://github.com/jpreiss/pypdfdeck (branch videos).

I want to add a feature where any embedded URL that starts with file:// is interpreted to mean "instead of the PDF contents, display the video from this local path when viewing this page".

The command line pdfinfo can extract the URLs, but it is not exposed through the current python interface.

To do this, it would be nice if I can lean on pdf2image to properly find the poppler binaries, etc. Therefore, I added the option to extract URLs in pdfinfo_from_path().

This is not ready to merge - it needs design review, tests, equivalent _from_bytes() version, better docs, etc. Just wanted to check if this feature is actually desired before I finish the work.

Thanks!