Belval / pdf2image

A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
MIT License
1.51k stars 187 forks source link

Due to reduced Image quality (after conversion), text is not readable #241

Open jaiswati opened 1 year ago

jaiswati commented 1 year ago

Describe the bug Due to reduced Image quality (after conversion), text is not readable . This has been tried in colab notebook To Reproduce Steps to reproduce the behavior: from pdf2image import convert_from_path, convert_from_bytes from IPython.display import display, Image

images = convert_from_bytes(open('/content/sample_data/test.pdf', 'rb').read(), size=800,dpi=400) display(images[0])

Expected behavior text in the image should be clear

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information): colab notebook with Chrome browser

Belval commented 1 year ago

The way the code is written, both parameters (dpi and size) are being sent to pdftoppm this means that the issue you are seeing is most likely not at the pdf2image level, but in the underlying library.

My best solution would be to resize the output of pdf2image manually instead of using the parameter. Something like:

from PIL import Image

images = convert_from_bytes(open('/content/sample_data/test.pdf', 'rb').read(), size=800,dpi=400)

images[0].thumbnail((800, 800)) # This is in place I think

display(images[0])