PDFPageCountError: Unable to get page count on Linux #244

Closed stockpilothq closed 1 year ago

stockpilothq commented 1 year ago

Hi everyone,

I've set up a project which uses pdf2image. I installed Poppler with Brew and it works locally (on my MacOS) like a charm.

Production on the other hand drives me crazy. I setup a Dockerfile and added the following command: RUN apt update && apt-get install -y poppler-utils

CLI outputs:

$ find / -name poppler-utils
$ find / -name poppler
Everything seems to be installed correctly. But the moment I try to convert a pdf_from_path I retrieve the following error:

PDFPageCountError: Unable to get page count. Internal Error: Cannot handle URI ''.

Python code:

            file_path = f'{file_name}'
            images = convert_from_path(file_path, 500)
            cleaned_name = str(file_name)[:-4]
            for img in images:
                blob = BytesIO()
      , 'JPEG')
                img_entry = ImageEntry.objects.create(invoice=invoice)
      '{cleaned_name}-{n}.jpg', File(blob), save=True) 

        except PDFInfoNotInstalledError as err:
            print(f"PDFInfoNotInstalledError: {err}")   

        except PDFPageCountError as err:
            print(f"PDFPageCountError: {err}")  

        except PDFSyntaxError as err:
            print(f"PDFSyntaxError: {err}") 

        except Exception as err:
            print(f"Exception: {err}")  
The answers on this error I find by search are all related to poppler_path and windows, which does not help. Hope someone can tell me with this issue.

Thanks in advance.

Belval commented 1 year ago

Internal Error: Cannot handle URI ''.

Seems like you are trying to parse a PDF hosted on a webpage, this is not supported. You need to download the file locally (to disk of memory) before trying to parse it.

stockpilothq commented 1 year ago

Works, thanks so much!