Closed sureshdarsru closed 3 years ago
Hi, thank you for providing all the necessary information. Unfortunately, I think it is a bug in the libpoppler library that is causing the issue, and not one in pdfimage
. I would suggest first checking that you are able to open the PDF document in Chrome.
I don't mind taking a look if you can provide a sample PDF to reproduce the issue.
Thanks for your response. I am able to open a pdf in Chrome. No issues. Uploaded this file. Kindly check. 1234567890.1.pdf
Can I have your update please on my query above
Your PDF can be converted on my machine without any issue, I think your issue resides in how pdfinfo is installed on your system.
Please try to run pdfinfo.exe your_file.pdf
and report back the output.
Hi Belval , Thanks for your response. PFB the response I received for pdfinfo.
E:\Program Files\Python\Lib\site-packages\pdf2image\poppler-0.68.0\bin>pdfinfo E:\Program Files\Python\Python37-32\SrcFiles\1234567890.1.pdf
pdfinfo version 0.68.0
Copyright 2005-2018 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
Usage: pdfinfo [options]
E:\Program Files\Python\Lib\site-packages\pdf2image\poppler-0.68.0\bin>
Please let me know if any issue in detail. Thanks in advance for your quicker responses :-)
It's not returning the right thing because your path contains spaces, make sure to escape those spaces (pdf2image
will do it automatically, but not in the console).
I moved the pdf file under bin folder and executed pdfinfo and got the response as below.
E:\Program Files\Python\Lib\site-packages\pdf2image\poppler-0.68.0\bin>pdfinfo 1234567890.1.pdf Author: Dainik Creator: Microsoft® Word 2016 Producer: www.ilovepdf.com CreationDate: 09/06/19 16:23:09 India Standard Time ModDate: 09/06/19 16:23:10 India Standard Time Tagged: yes UserProperties: no Suspects: no Form: none JavaScript: no Pages: 1 Encrypted: no Page size: 612 x 792 pts (letter) Page rot: 0 File size: 146367 bytes Optimized: no PDF version: 1.5
E:\Program Files\Python\Lib\site-packages\pdf2image\poppler-0.68.0\bin>
Hope its fine to proceed. Please check and reply. Thanks.
So your pdfinfo
installation is fine, my guess would be that you are not using the right path when passing poppler_path
to convert_from_path
.
Hi Belval, Thanks for your quicker responses. Let me check and get back.
@Belval I get the same error in a docker image, how to fix it ? Why is poppler_path
need to be explicitly defined when using convert_from_bytes
?
@deppmish2 and @sureshdarsru
I am having the same issue, did you ever find a solution that worked for you? please help.
@deppmish2 and @sureshdarsru
I am having the same issue, did you ever find a solution that worked for you? please help.
@sureshdarsru yes installing poppler-utils
worked for me --> apt-get install -y poppler-utils
specifying the poppler path in the file solved it for me:
convert_from_path(poppler_path=r'C:\Program Files\poppler-0.68.0\bin')
I'm running into the same issue while running it on Google Colab. Any solution to avoid that error on Google Colab?
I am not familiar with Google Colab, but you generally have two possible solutions when running in constrained environment on which you do not have root access:
conda install -c conda-forge poppler
poppler_path=your_directory/
In both case the process is a bit more involved. You can see my repo on how to get the binaries for Ubuntu: https://github.com/Belval/pdf2image-as-a-service/tree/master/as-a-function (You can run build_poppler.sh to get the binaries).
Hopefully that helps.
Thanks! I’ll give it shot today and update you on its success (hopefully) On Aug 20, 2020, 10:39 AM -0400, Edouard Belval notifications@github.com, wrote:
I am not familiar with Google Colab, but you generally have two possible solutions when running in constrained environment on which you do not have root access:
• Installing with conda: conda install -c conda-forge poppler • Uploading the binaries and using poppler_path=your_directory/
In both case the process is a bit more involved. You can see my repo on how to get the binaries for Ubuntu: https://github.com/Belval/pdf2image-as-a-service/tree/master/as-a-function (You can run build_poppler.sh to get the binaries). Hopefully that helps. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi Belval,
I am getting the same error once I publish my azure function which has convert_from_path() in it.
Result: Failure Exception: PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? Stack: File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 349, in _handleinvocation_request self.__run_sync_func, invocation_id, fi.func, args) File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 511, in run_sync_func return func(**params) File "/home/site/wwwroot/AzureBlobTriggerFunc/init.py", line 48, in main images = convert_from_path(download_file.name, output_folder=path) File "/home/site/wwwroot/.python_packages/lib/site-packages/pdf2image/pdf2image.py", line 97, in convert_from_path page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] File "/home/site/wwwroot/.python_packages/lib/site-packages/pdf2image/pdf2image.py", line 468, in pdfinfo_from_path "Unable to get page count. Is poppler installed and in PATH?"
When I publish the function, I can see poppler-utils package being installed. Still when I trigger this azure function, I get the above error. I also tried giving the poppler-path inside convert_from_path().
Please let me know on this. Thanks in advance.
I am unfamiliar with the Azure function environment and as such this is general advice.
That being said, you should try to troubleshoot it by simply having a function that opens a process and prints the help of pdftoppm
(poppler). You will be able to get a different message that might be more relevant.
Something like this:
import subprocess
def main():
p = subprocess.Popen(["pdftoppm", "-h"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
print(out, err)
As a general recommendation, I would bundle the poppler utilities with your package to avoid installing it in the function environment. This will save you a great deal of headaches and you can call the function with poppler_path
.
Thanks Belval for the response. I will try this option.
Haven't you solved it yet?Download poppler-0.68.0_x86,and add path: ''C:\poppler-0.68.0\bin''. Your Finished
I am unfamiliar with the Azure function environment and as such this is general advice.
That being said, you should try to troubleshoot it by simply having a function that opens a process and prints the help of
pdftoppm
(poppler). You will be able to get a different message that might be more relevant.Something like this:
import subprocess def main(): p = subprocess.Popen(["pdftoppm", "-h"], stdout=subprocess.PIPE, stderr=subprocess.PIPE) out, err = p.communicate() print(out, err)
As a general recommendation, I would bundle the poppler utilities with your package to avoid installing it in the function environment. This will save you a great deal of headaches and you can call the function with
poppler_path
.
Haven't you solved it yet?Download poppler-0.68.0_x86,and add path: ''C:\poppler-0.68.0\bin''. Your Finished
Hi Belval,
I am getting the same error once I publish my azure function which has convert_from_path() in it.
Result: Failure Exception: PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? Stack: File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 349, in _handleinvocation_request self.__run_sync_func, invocation_id, fi.func, args) File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/azure-functions-host/workers/python/3.7/LINUX/X64/azure_functions_worker/dispatcher.py", line 511, in run_sync_func return func(params) File "/home/site/wwwroot/AzureBlobTriggerFunc/init**.py", line 48, in main images = convert_from_path(download_file.name, output_folder=path) File "/home/site/wwwroot/.python_packages/lib/site-packages/pdf2image/pdf2image.py", line 97, in convert_from_path page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] File "/home/site/wwwroot/.python_packages/lib/site-packages/pdf2image/pdf2image.py", line 468, in pdfinfo_from_path "Unable to get page count. Is poppler installed and in PATH?"
When I publish the function, I can see poppler-utils package being installed. Still when I trigger this azure function, I get the above error. I also tried giving the poppler-path inside convert_from_path().
Please let me know on this. Thanks in advance.
Haven't you solved it yet?Download poppler-0.68.0_x86,and add path: ''C:\poppler-0.68.0\bin''. Your Finished
I solved this by deploying the function via docker into azure container registry. I didn't got this error.
Guys, thank you very much. Adding to the PATH worked out on Windows 10. Did not find link in the discussion. So it's here: https://blog.alivate.com.au/poppler-windows/
I'm having the same issue. Put path to poppler in system path, but still getting
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
I'm having the same issue. Put path to poppler in system path, but still getting
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
First, make sure that when you open cmd.exe
on Windows you can call pdfinfo
. If you can't, then Poppler is not really in PATH. If you can, then something is changing your PATH variable at execution time. Look for virtual environment such as those created by your IDE. For example, PyCharm can create an execution process that doesn't copy the PATH variable. This has also happened with conda, although I don't remember what the user was doing that triggered the issue.
Finally, I would recommend just passing the poppler_path
parameter. It is much simpler (on Windows) and causes fewer issues in the long run, especially when deploying to servers.
Utilizing poppler_path did not solve the exe problem even though the script ran in python.
Same issue of pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH
. Unable to find the poppler directory installed, I have installed it via both pip and conda but the file path is C:\Users\name\AppData\Local\Programs\Python\Python36\Lib\site-packages\poppler
and does not seem to have the bin folder like the suggested answer. Any idea why?
Edit: Downloaded the poppler file for windows and stored in my C: drive and everything finally works. When I downloaded the popppler 0.89.0 version from anaconda straight, it will end up with the following error as there are some missing files. "pdfinfo.exe - System Error. The code execution cannot proceed because openjp2.dll was not found. Reinstalling the program may fix this problem"
@sandeepmj
I'm running into the same issue while running it on Google Colab. Any solution to avoid that error on Google Colab?
For me this line of code helped. I executed it before before importing, so my code looks like this in Colab
!sudo apt-get install -y poppler-utils
!pip install --upgrade pdf2image
from pdf2image import convert_from_path
from pdf2image import convert_from_path
from pdf2image.exceptions import ( PDFInfoNotInstalledError, PDFPageCountError, PDFSyntaxError )
images = convert_from_path('main_sleep-in-america-poll-national-sleep-foundation.pdf')
for i, image in enumerate(images): fname = "image" + str(i) + ".png" image.save(fname, "PNG")
formsense-microservice_1 | images = convert_from_path(file) formsense-microservice_1 | File "/usr/local/lib/python3.6/site-packages/pdf2image/pdf2image.py", line 98, in convert_from_path formsense-microservice_1 | page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] formsense-microservice_1 | File "/usr/local/lib/python3.6/site-packages/pdf2image/pdf2image.py", line 485, in pdfinfo_from_path formsense-microservice_1 | "Unable to get page count. Is poppler installed and in PATH?" formsense-microservice_1 | pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
poppler_path=r'C:\Program Files (x86)\poppler-21.09.0\Library\bin'
poppler_path=r'C:\Program Files (x86)\poppler-21.09.0\Library\bin'
i.e.: use r'' format
D:\PythonProjects\KYC\venv\Scripts\python.exe D:\PythonProjects\KYC\Fileread.py D:\DMC\AI\process\HZSPS8769A D:\DMC\AI\Input\HZSPS8769A.pdf Traceback (most recent call last): File "D:\PythonProjects\KYC\venv\lib\site-packages\pdf2image\pdf2image.py", line 568, in pdfinfo_from_path proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE) File "C:\Users\DELL\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 951, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\DELL\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 1420, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\PythonProjects\KYC\Fileread.py", line 20, in
Process finished with exit code 1
how to solve it
in my poppler-23.10.0 does not have a bin files
To resolve the "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?" issue, please follow these steps:
'C:\Program Files (x86)\poppler-24.02.0\Library\bin'
Thank you divakar kumar.i had found a solution already.find many alternatives method also.... Anyway THANKS
Yahoo Mail: Search, Organize, Conquer
On Wed, May 8, 2024 at 12:38, Divakar @.***> wrote:
To resolve the "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?" issue, please follow these steps:
'C:\Program Files (x86)\poppler-24.02.0\Library\bin'
image.png (view on web)
image.png (view on web)
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
To resolve the "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?" issue, please follow these steps:
- Download the latest poppler zip file from here
- Unzip it to preferred location: C:\Program Files (x86).
- After successfully unzipping the file, set the system variable. Go to the poppler bin location, copy the location path, and then set the system variable path.
'C:\Program Files (x86)\poppler-24.02.0\Library\bin'
- Restart your vscode or jupyter notebook
This Worked, Thank you
My Code: images = convert_from_path('E:\Program Files\Python\Python37-32\SrcFiles\2234567895.pdf',poppler_path='E:\Program Files\Python\Lib\site-packages\pdf2image\poppler-0.68.0\bin')
for i, image in enumerate(images): fname = "image" + str(i) + ".tif" image.save(fname, "TIF")
Error Message: Traceback (most recent call last): File "E:\Program Files\Python\lib\site-packages\pdf2image\pdf2image.py", line 420, in pdfinfo_from_path proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE) File "E:\Program Files\Python\lib\subprocess.py", line 854, in init self._execute_child(args, executable, preexec_fn, close_fds, File "E:\Program Files\Python\lib\subprocess.py", line 1307, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "E:\Program Files\Python\Python37-32\Conv.py", line 67, in
images = convert_from_path('E:\Program Files\Python\Python37-32\SrcFiles\2234567895.pdf',poppler_path='E:\Program Files\Python\Lib\site-packages\pdf2image\poppler-0.68.0\bin')
File "E:\Program Files\Python\lib\site-packages\pdf2image\pdf2image.py", line 94, in convert_from_path
page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
File "E:\Program Files\Python\lib\site-packages\pdf2image\pdf2image.py", line 441, in pdfinfo_from_path
raise PDFInfoNotInstalledError(
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
pdfinfo response:
Kindly help : suresh.darsru@gmail.com