RWMostert / pdf-to-image-lambda

A Chalice-based AWS Lambda function to convert a PDF document in a source S3 bucket to an image.
15 stars 8 forks source link

Poppler not found error #1

Open mehullala opened 4 years ago

mehullala commented 4 years ago

Hi,

I have followed all steps and poppler files are under same path as you explained but still getting this error, please help me out.

[ERROR] PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? Traceback (most recent call last): File "/var/task/chalice/app.py", line 1429, in call return self.func(event_obj) File "/var/task/app.py", line 75, in pdf_to_image images = convert_from_bytes(infile, File "/var/task/pdf2image/pdf2image.py", line 260, in convert_from_bytes return convert_from_path( File "/var/task/pdf2image/pdf2image.py", line 94, in convert_from_path page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] File "/var/task/pdf2image/pdf2image.py", line 441, in pdfinfo_from_path raise PDFInfoNotInstalledError(

swpnk commented 4 years ago

I am getting the same error. Were you able to fix it?

swpnk commented 4 years ago

Hi,

I have found the solution for this. You need to download poppler binaries for lambda and add them as a layer. In the code change the /var/task/ .. to /opt/bin

it should start working now

billsinc commented 4 years ago

@swpnk I didn't have any luck with this no matter which path I entered or method I used. Can you elaborate?

swpnk commented 4 years ago

Hi @billsinc sure. Basically download the poppler binaries and put them as a layer for the lambda function. So when we want to refer to the layer we use the path /opt/bin.

I can give you the poppler binaries in a folder if you would like.

billsinc commented 4 years ago

@swpnk it looks like the binaries are already there so I added "automatic_layer": true, to config.json and tried both of the following paths (based upon where I was told they would exist as a layer) and neither worked. poppler_path = '/opt/lib/poppler-utils-0.26/usr/bin' OR poppler_path = '/opt/python/lib/python3.8/site-packages/lib/poppler-utils-0.26/usr/bin'

swpnk commented 4 years ago

I think its the problem with the binaries. I can share the binaries I have if you like. The ones I have found first from a casual search did not work as well.

Sorry its all muddled as it was over a month ago.

joshisachin675 commented 4 years ago

@swpnk @billsinc @RWMostert Is this issue fixed? I am facing the same problem I am using windows and AWS Lambda after deploying code into Lambda I am getting this error. { "errorMessage": "Unable to get page count. Is poppler installed and in PATH?", "errorType": "PDFInfoNotInstalledError", "stackTrace": [ " File \"/var/task/chalice/app.py\", line 1445, in call\n return self.handler(event_obj)\n", " File \"/var/task/app.py\", line 75, in pdf_to_image\n images = convert_from_bytes(infile,\n", " File \"/var/task/pdf2image/pdf2image.py\", line 270, in convert_from_bytes\n return convert_from_path(\n", " File \"/var/task/pdf2image/pdf2image.py\", line 97, in convert_from_path\n page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)[\"Pages\"]\n", " File \"/var/task/pdf2image/pdf2image.py\", line 467, in pdfinfo_from_path\n raise PDFInfoNotInstalledError(\n" ] }

If you guys fix this issue please let us know its a blocker for us.

jimjeffers commented 3 years ago

I got around this by installing the lambda layer from this repo: https://github.com/jeylabs/aws-lambda-poppler-layer

  1. I followed the directions on the README for the poplar layer - download the latest zip from releases and manually upload the layer to the AWS layers console.
  2. Add the ARN of the new layer to {..."layers": [<ARN>], ...} to the .chalice/config.json
  3. Remove the poppler_path='...' from this line https://github.com/RWMostert/pdf-to-image-lambda/blob/master/app.py#L78

pdf2image should now be able to find popper via the lambda layer.

abhishekaccenture commented 3 years ago

Anyone got around to fixing this? @jimjeffers By removing the poppler_path, do you mean we need to remove the statement all together or replace it with a different declaration. Tried a bunch of things, but it always seems to giving back the same error!

[ERROR] PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? Traceback (most recent call last): File "/var/task/chalice/app.py", line 1595, in call return self.handler(event_obj) File "/var/task/app.py", line 73, in pdf_to_image images = convert_from_bytes(infile, File "/opt/python/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 283, in convert_from_bytes return convert_from_path( File "/opt/python/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 98, in convert_from_path page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] File "/opt/python/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 484, in pdfinfo_from_path raise PDFInfoNotInstalledError(