galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams
http://www.pdfhummus.com
Other
1.14k stars 169 forks source link

Extract jpg from pdf #341

Closed msageryd closed 5 years ago

msageryd commented 5 years ago

Is it possible to extract jpg files from pdfs with Hummus?

galkahana commented 5 years ago

You could extract image data from a PDF file. however images are embedded in PDF as raw data encrypted with one or more filter (most of the time it is raw data encrypted with something. like a DCT filter...which is actually JPG). so you'll need to run through the pages, grab their resources dictionaries, grab the images from there, read their streams and export.

yeah possible...there's a bit to learn. Gal.

msageryd commented 5 years ago

Thank you. I think I'll go ahead with poppler and pdftocairo for extracting images.