Belval / pdf2image

A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
MIT License
1.51k stars 187 forks source link

Add support for WebP image format #268

Closed drnushooz closed 1 year ago

drnushooz commented 1 year ago

WebP is a modern, open source image format created by Google which boasts a lot of advantages over jpeg and png. This PR adds support to write webp files from within pdf2image library by using Pillow's built in support for webp image format. This PR also rectifies some of the test names which weren't correct.

drnushooz commented 1 year ago

@Belval would you have time to look at this PR before the next release? Any comments are welcome.

drnushooz commented 1 year ago

@Belval You're welcome. I made a small modification for space optimization. Please take another look. Thanks!

Belval commented 1 year ago

This was marked as merged but was then reverted as I do not think the webp format conversion should happen in pdf2image after all. I try to keep this package as a simple wrapper around pdftoppm and pdftocairo and since webp is not supported in the rasterizing library I don't think it should be part of pdf2image.

In the future, for anyone finding this PR, please note that you can convert the format with Pillow after the conversion:

from io import BytesIO
from pdf2image import convert_from_path

images = convert_from_path("your_file.pdf")

webp_images = []
for image in images:
    buf = BytesIO()
    image.save(buf, "webp")
    webp_images.append(Image.open(buf))