allenai / pdffigures2

Given a scholarly PDF, extract figures, tables, captions, and section titles.
http://pdffigures2.allenai.org/
Apache License 2.0
611 stars 122 forks source link

Unable to extract all images from PDF #59

Open Davidwhw opened 5 months ago

Davidwhw commented 5 months ago

When I use the pdffigures2 to extract images from a PDF, there are often images that are overlooked. For example, it extracts only 3 images from a PDF file that contains 5 images. I guess maybe pdffigures2 uses default parameters such as "image size" or "resolution" to filter some images? Can you give me some advice or clues? Thank you for your assistance.