allenai / pdffigures2

Given a scholarly PDF, extract figures, tables, captions, and section titles.
http://pdffigures2.allenai.org/
Apache License 2.0
611 stars 122 forks source link

Extracting figures in their original resolution #17

Open samyak24jain opened 7 years ago

samyak24jain commented 7 years ago

I've been using pdffigures2 to extract figures from PDFs. It works great! As mentioned in the paper, pdffigures2 does not require rendering the PDF to a bitmap, as opposed to pdffigures, to extract the graphical elements from the pages. Is there anyway in which the figures could be extracted in their original resolution instead of the user specifying the DPI? If not, is there any optimum DPI to get a good resolution on all images extracted? The default DPI=72 downgrades the quality of good resolution images in PDFs and specifying a higher DPI reduces the quality of low resolution images. Any help would be appreciated!