Belval / pdf2image

A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
MIT License
1.66k stars 195 forks source link

Wrong image converted #258

Open juhonkang opened 1 year ago

juhonkang commented 1 year ago

I use pdf2image to turn to pdf to image, however, the image is corrupted like this image

# use pdf2image to convert pdf to image
from pdf2image import convert_from_path

images = convert_from_path('data/Dev_data/All_robert_bosch/exception/IV WGEPEN230072.pdf', dpi=600)

save_fp = 'data/Dev_data/All_robert_bosch/exception/IV WGEPEN230072.pdf'

images[0].save('data/Dev_data/All_robert_bosch/exception/IV WGEPEN230072_.png', 'PNG')

images[0].save(
    save_fp, "PDF" ,resolution=100.0, save_all=True, append_images=images[1:]
)

data link: https://we.tl/t-B1CT1CG4hB

juhonkang commented 1 year ago

password is: "nahnah"

Belval commented 1 year ago

This is a common issue when fonts are missing from the OS where pdf2image is installed. You can search the closed issues for a solution that might work for you: https://github.com/Belval/pdf2image/issues?q=is%3Aissue+missing+font+is%3Aclosed but make sure that poppler-data is installed.