Belval / pdf2image

A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
MIT License
1.64k stars 195 forks source link

pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? #142

Closed sureshdarsru closed 3 years ago

sureshdarsru commented 4 years ago

My Code: images = convert_from_path('E:\Program Files\Python\Python37-32\SrcFiles\2234567895.pdf',poppler_path='E:\Program Files\Python\Lib\site-packages\pdf2image\poppler-0.68.0\bin')

for i, image in enumerate(images): fname = "image" + str(i) + ".tif" image.save(fname, "TIF")

Error Message: Traceback (most recent call last): File "E:\Program Files\Python\lib\site-packages\pdf2image\pdf2image.py", line 420, in pdfinfo_from_path proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE) File "E:\Program Files\Python\lib\subprocess.py", line 854, in init self._execute_child(args, executable, preexec_fn, close_fds, File "E:\Program Files\Python\lib\subprocess.py", line 1307, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "E:\Program Files\Python\Python37-32\Conv.py", line 67, in images = convert_from_path('E:\Program Files\Python\Python37-32\SrcFiles\2234567895.pdf',poppler_path='E:\Program Files\Python\Lib\site-packages\pdf2image\poppler-0.68.0\bin') File "E:\Program Files\Python\lib\site-packages\pdf2image\pdf2image.py", line 94, in convert_from_path page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] File "E:\Program Files\Python\lib\site-packages\pdf2image\pdf2image.py", line 441, in pdfinfo_from_path raise PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

pdfinfo response:

pdfinfo E:\Program Files\Python\Python37-32\SrcFiles\2234567895.pdf File "", line 1 pdfinfo E:\Program Files\Python\Python37-32\SrcFiles\2234567895.pdf SyntaxError: invalid syntax

Kindly help : suresh.darsru@gmail.com

Belval commented 4 years ago

Hi, thank you for providing all the necessary information. Unfortunately, I think it is a bug in the libpoppler library that is causing the issue, and not one in pdfimage. I would suggest first checking that you are able to open the PDF document in Chrome.

I don't mind taking a look if you can provide a sample PDF to reproduce the issue.

sureshdarsru commented 4 years ago

Thanks for your response. I am able to open a pdf in Chrome. No issues. Uploaded this file. Kindly check. 1234567890.1.pdf

sureshdarsru commented 4 years ago

Can I have your update please on my query above

Belval commented 4 years ago

Your PDF can be converted on my machine without any issue, I think your issue resides in how pdfinfo is installed on your system.

Please try to run pdfinfo.exe your_file.pdfand report back the output.

sureshdarsru commented 4 years ago

Hi Belval , Thanks for your response. PFB the response I received for pdfinfo.

E:\Program Files\Python\Lib\site-packages\pdf2image\poppler-0.68.0\bin>pdfinfo E:\Program Files\Python\Python37-32\SrcFiles\1234567890.1.pdf pdfinfo version 0.68.0 Copyright 2005-2018 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC Usage: pdfinfo [options] -f : first page to convert -l : last page to convert -box : print the page bounding boxes -meta : print the document metadata (XML) -js : print all JavaScript in the PDF -struct : print the logical document structure (for tagged files) -struct-text : print text contents along with document structure (for tagged files) -isodates : print the dates in ISO-8601 format -rawdates : print the undecoded date strings directly from the PDF file -dests : print all named destinations in the PDF -enc : output text encoding name -listenc : list available encodings -opw : owner password (for encrypted files) -upw : user password (for encrypted files) -v : print copyright and version info -h : print usage information -help : print usage information --help : print usage information -? : print usage information

E:\Program Files\Python\Lib\site-packages\pdf2image\poppler-0.68.0\bin>

Please let me know if any issue in detail. Thanks in advance for your quicker responses :-)

Belval commented 4 years ago

It's not returning the right thing because your path contains spaces, make sure to escape those spaces (pdf2image will do it automatically, but not in the console).

sureshdarsru commented 4 years ago

I moved the pdf file under bin folder and executed pdfinfo and got the response as below.

E:\Program Files\Python\Lib\site-packages\pdf2image\poppler-0.68.0\bin>pdfinfo 1234567890.1.pdf Author: Dainik Creator: Microsoft® Word 2016 Producer: www.ilovepdf.com CreationDate: 09/06/19 16:23:09 India Standard Time ModDate: 09/06/19 16:23:10 India Standard Time Tagged: yes UserProperties: no Suspects: no Form: none JavaScript: no Pages: 1 Encrypted: no Page size: 612 x 792 pts (letter) Page rot: 0 File size: 146367 bytes Optimized: no PDF version: 1.5

E:\Program Files\Python\Lib\site-packages\pdf2image\poppler-0.68.0\bin>

Hope its fine to proceed. Please check and reply. Thanks.

Belval commented 4 years ago

So your pdfinfo installation is fine, my guess would be that you are not using the right path when passing poppler_path to convert_from_path.

sureshdarsru commented 4 years ago

Hi Belval, Thanks for your quicker responses. Let me check and get back.

deppmish2 commented 4 years ago

@Belval I get the same error in a docker image, how to fix it ? Why is poppler_path need to be explicitly defined when using convert_from_bytes ?

Senzokuhle commented 4 years ago

@deppmish2 and @sureshdarsru

I am having the same issue, did you ever find a solution that worked for you? please help.

deppmish2 commented 4 years ago

@deppmish2 and @sureshdarsru

I am having the same issue, did you ever find a solution that worked for you? please help.

@sureshdarsru yes installing poppler-utils worked for me --> apt-get install -y poppler-utils

Senzokuhle commented 4 years ago

specifying the poppler path in the file solved it for me:

convert_from_path(poppler_path=r'C:\Program Files\poppler-0.68.0\bin')

sandeepmj commented 4 years ago

I'm running into the same issue while running it on Google Colab. Any solution to avoid that error on Google Colab?

Belval commented 4 years ago

I am not familiar with Google Colab, but you generally have two possible solutions when running in constrained environment on which you do not have root access:

In both case the process is a bit more involved. You can see my repo on how to get the binaries for Ubuntu: https://github.com/Belval/pdf2image-as-a-service/tree/master/as-a-function (You can run build_poppler.sh to get the binaries).

Hopefully that helps.

sandeepmj commented 4 years ago

Thanks! I’ll give it shot today and update you on its success (hopefully) On Aug 20, 2020, 10:39 AM -0400, Edouard Belval notifications@github.com, wrote:

I am not familiar with Google Colab, but you generally have two possible solutions when running in constrained environment on which you do not have root access:

• Installing with conda: conda install -c conda-forge poppler • Uploading the binaries and using poppler_path=your_directory/

In both case the process is a bit more involved. You can see my repo on how to get the binaries for Ubuntu: https://github.com/Belval/pdf2image-as-a-service/tree/master/as-a-function (You can run build_poppler.sh to get the binaries). Hopefully that helps. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

hemajayachandran commented 4 years ago

Hi Belval,

I am getting the same error once I publish my azure function which has convert_from_path() in it.

Result: Failure Exception: PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? Stack: File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 349, in _handleinvocation_request self.__run_sync_func, invocation_id, fi.func, args) File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 511, in run_sync_func return func(**params) File "/home/site/wwwroot/AzureBlobTriggerFunc/init.py", line 48, in main images = convert_from_path(download_file.name, output_folder=path) File "/home/site/wwwroot/.python_packages/lib/site-packages/pdf2image/pdf2image.py", line 97, in convert_from_path page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] File "/home/site/wwwroot/.python_packages/lib/site-packages/pdf2image/pdf2image.py", line 468, in pdfinfo_from_path "Unable to get page count. Is poppler installed and in PATH?"

When I publish the function, I can see poppler-utils package being installed. Still when I trigger this azure function, I get the above error. I also tried giving the poppler-path inside convert_from_path().

Please let me know on this. Thanks in advance.

Belval commented 4 years ago

I am unfamiliar with the Azure function environment and as such this is general advice.

That being said, you should try to troubleshoot it by simply having a function that opens a process and prints the help of pdftoppm (poppler). You will be able to get a different message that might be more relevant.

Something like this:

import subprocess

def main():
   p = subprocess.Popen(["pdftoppm", "-h"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
   out, err = p.communicate()
   print(out, err)

As a general recommendation, I would bundle the poppler utilities with your package to avoid installing it in the function environment. This will save you a great deal of headaches and you can call the function with poppler_path.

hemajayachandran commented 4 years ago

Thanks Belval for the response. I will try this option.

TACHENZHICHAO commented 3 years ago

Haven't you solved it yet?Download poppler-0.68.0_x86,and add path: ''C:\poppler-0.68.0\bin''. Your Finished image

TACHENZHICHAO commented 3 years ago

I am unfamiliar with the Azure function environment and as such this is general advice.

That being said, you should try to troubleshoot it by simply having a function that opens a process and prints the help of pdftoppm (poppler). You will be able to get a different message that might be more relevant.

Something like this:

import subprocess

def main():
   p = subprocess.Popen(["pdftoppm", "-h"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
   out, err = p.communicate()
   print(out, err)

As a general recommendation, I would bundle the poppler utilities with your package to avoid installing it in the function environment. This will save you a great deal of headaches and you can call the function with poppler_path.

Haven't you solved it yet?Download poppler-0.68.0_x86,and add path: ''C:\poppler-0.68.0\bin''. Your Finished

TACHENZHICHAO commented 3 years ago

Hi Belval,

I am getting the same error once I publish my azure function which has convert_from_path() in it.

Result: Failure Exception: PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? Stack: File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 349, in _handleinvocation_request self.__run_sync_func, invocation_id, fi.func, args) File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 511, in run_sync_func return func(params) File "/home/site/wwwroot/AzureBlobTriggerFunc/init**.py", line 48, in main images = convert_from_path(download_file.name, output_folder=path) File "/home/site/wwwroot/.python_packages/lib/site-packages/pdf2image/pdf2image.py", line 97, in convert_from_path page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] File "/home/site/wwwroot/.python_packages/lib/site-packages/pdf2image/pdf2image.py", line 468, in pdfinfo_from_path "Unable to get page count. Is poppler installed and in PATH?"

When I publish the function, I can see poppler-utils package being installed. Still when I trigger this azure function, I get the above error. I also tried giving the poppler-path inside convert_from_path().

Please let me know on this. Thanks in advance.

Haven't you solved it yet?Download poppler-0.68.0_x86,and add path: ''C:\poppler-0.68.0\bin''. Your Finished

hemajayachandran commented 3 years ago

I solved this by deploying the function via docker into azure container registry. I didn't got this error.

Pfed-prog commented 3 years ago

Guys, thank you very much. Adding to the PATH worked out on Windows 10. Did not find link in the discussion. So it's here: https://blog.alivate.com.au/poppler-windows/

MarynaLongnickel commented 3 years ago

I'm having the same issue. Put path to poppler in system path, but still getting pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

Belval commented 3 years ago

I'm having the same issue. Put path to poppler in system path, but still getting pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

First, make sure that when you open cmd.exe on Windows you can call pdfinfo. If you can't, then Poppler is not really in PATH. If you can, then something is changing your PATH variable at execution time. Look for virtual environment such as those created by your IDE. For example, PyCharm can create an execution process that doesn't copy the PATH variable. This has also happened with conda, although I don't remember what the user was doing that triggered the issue.

Finally, I would recommend just passing the poppler_path parameter. It is much simpler (on Windows) and causes fewer issues in the long run, especially when deploying to servers.

Pfed-prog commented 3 years ago

Utilizing poppler_path did not solve the exe problem even though the script ran in python.

gohjiayi commented 3 years ago

Same issue of pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH. Unable to find the poppler directory installed, I have installed it via both pip and conda but the file path is C:\Users\name\AppData\Local\Programs\Python\Python36\Lib\site-packages\poppler and does not seem to have the bin folder like the suggested answer. Any idea why?

Edit: Downloaded the poppler file for windows and stored in my C: drive and everything finally works. When I downloaded the popppler 0.89.0 version from anaconda straight, it will end up with the following error as there are some missing files. "pdfinfo.exe - System Error. The code execution cannot proceed because openjp2.dll was not found. Reinstalling the program may fix this problem"

pavel-nesterov commented 3 years ago

@sandeepmj

I'm running into the same issue while running it on Google Colab. Any solution to avoid that error on Google Colab?

For me this line of code helped. I executed it before before importing, so my code looks like this in Colab

!sudo apt-get install -y poppler-utils
!pip install --upgrade pdf2image 
from pdf2image import convert_from_path
ABHINAYGUPTA123 commented 3 years ago

use this, it will definitely work in colab

from pdf2image import convert_from_path

from pdf2image.exceptions import ( PDFInfoNotInstalledError, PDFPageCountError, PDFSyntaxError )

images = convert_from_path('main_sleep-in-america-poll-national-sleep-foundation.pdf')

for i, image in enumerate(images): fname = "image" + str(i) + ".png" image.save(fname, "PNG")

bharath-kumarn commented 3 years ago

formsense-microservice_1 | images = convert_from_path(file) formsense-microservice_1 | File "/usr/local/lib/python3.6/site-packages/pdf2image/pdf2image.py", line 98, in convert_from_path formsense-microservice_1 | page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] formsense-microservice_1 | File "/usr/local/lib/python3.6/site-packages/pdf2image/pdf2image.py", line 485, in pdfinfo_from_path formsense-microservice_1 | "Unable to get page count. Is poppler installed and in PATH?" formsense-microservice_1 | pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

tspb commented 3 years ago

poppler_path=r'C:\Program Files (x86)\poppler-21.09.0\Library\bin'

tspb commented 3 years ago

poppler_path=r'C:\Program Files (x86)\poppler-21.09.0\Library\bin'

i.e.: use r'' format

Nithishrish23 commented 1 year ago

D:\PythonProjects\KYC\venv\Scripts\python.exe D:\PythonProjects\KYC\Fileread.py D:\DMC\AI\process\HZSPS8769A D:\DMC\AI\Input\HZSPS8769A.pdf Traceback (most recent call last): File "D:\PythonProjects\KYC\venv\lib\site-packages\pdf2image\pdf2image.py", line 568, in pdfinfo_from_path proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE) File "C:\Users\DELL\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 951, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\DELL\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 1420, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\PythonProjects\KYC\Fileread.py", line 20, in images=convert_from_path(pdfPath,poppler_path=r"C:\Program Files\poppler-23.10.0") File "D:\PythonProjects\KYC\venv\lib\site-packages\pdf2image\pdf2image.py", line 127, in convert_from_path page_count = pdfinfo_from_path( File "D:\PythonProjects\KYC\venv\lib\site-packages\pdf2image\pdf2image.py", line 594, in pdfinfo_from_path raise PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

Process finished with exit code 1

how to solve it

Nithishrish23 commented 1 year ago

in my poppler-23.10.0 does not have a bin files

divakarkumarp commented 6 months ago

To resolve the "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?" issue, please follow these steps:

  1. Download the latest poppler zip file from here
  2. Unzip it to preferred location: C:\Program Files (x86).
  3. After successfully unzipping the file, set the system variable. Go to the poppler bin location, copy the location path, and then set the system variable path.

'C:\Program Files (x86)\poppler-24.02.0\Library\bin'

image

image

  1. Restart your vscode or jupyter notebook
Nithishrish23 commented 6 months ago

Thank you divakar kumar.i had found a solution already.find many alternatives method also.... Anyway THANKS 

Yahoo Mail: Search, Organize, Conquer

On Wed, May 8, 2024 at 12:38, Divakar @.***> wrote:

To resolve the "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?" issue, please follow these steps:

'C:\Program Files (x86)\poppler-24.02.0\Library\bin'

image.png (view on web)

image.png (view on web)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

SGCODEX commented 4 months ago

To resolve the "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?" issue, please follow these steps:

  1. Download the latest poppler zip file from here
  2. Unzip it to preferred location: C:\Program Files (x86).
  3. After successfully unzipping the file, set the system variable. Go to the poppler bin location, copy the location path, and then set the system variable path.

'C:\Program Files (x86)\poppler-24.02.0\Library\bin'

image

image

  1. Restart your vscode or jupyter notebook

This Worked, Thank you

lix19937 commented 1 month ago

https://stackoverflow.com/questions/53481088/poppler-in-path-for-pdf2image
https://blog.csdn.net/qq_40600379/article/details/136153779