Belval / pdf2image

A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
MIT License
1.51k stars 187 forks source link

cannot identify image file using pdf2image.convert_from_bytes #233

Open alistairwgillespie opened 2 years ago

alistairwgillespie commented 2 years ago

Hi,

I'm using AWS Lambda to run pipelines that consume PDF documents.

When attempting to optimize memory allocation forpdf2image.convert_from_bytes using context management and an output_folder, I get the following error: `cannot identify image file '/tmp/tmprz6rwu8a/a606ca84-e027-4d88-88aa-6d25099a9776-18.ppm'

My code looks like so:

  pil_images=None
  images=None
  with tempfile.TemporaryDirectory() as tmpdir:
      pil_images = pdf2image.convert_from_bytes(
          document_bytes,
          dpi=dpi,
          output_folder=tmpdir
      )
      pil_images = [rsz(i, resize) for i in pil_images]
      images = [image_to_bytes(i, fmt) for i in pil_images] 
  ...

Any help is much appreciated.

Belval commented 2 years ago

Does this happen with a specific PDF file?