Lambda layers for Python 3.12 PDF raising an exception on missing libpng16.so.16

Viajante80 commented 5 months ago

lambda-layers 50 https://github.com/aws-samples/amazon-textract-textractor/actions/runs/9550648081 artifacts - textractor-lambda-p312-pdf

"errorMessage": "Unable to get page count.\npdfinfo: error while loading shared libraries: libpng16.so.16: cannot open shared object file: No such file or directory\n",
"errorType": "PDFPageCountError",
"stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 27, in lambda_handler\n    textract = extractor.start_document_analysis(\n",
    "  File \"/opt/python/textractor/textractor.py\", line 575, in start_document_analysis\n    images = self._get_document_images_from_path(original_file_source)\n",
    "  File \"/opt/python/textractor/textractor.py\", line 133, in _get_document_images_from_path\n    images = convert_from_bytes(bytearray(file_obj))\n",
    "  File \"/opt/python/pdf2image/pdf2image.py\", line 359, in convert_from_bytes\n    return convert_from_path(\n",
    "  File \"/opt/python/pdf2image/pdf2image.py\", line 127, in convert_from_path\n    page_count = pdfinfo_from_path(\n",
    "  File \"/opt/python/pdf2image/pdf2image.py\", line 611, in pdfinfo_from_path\n    raise PDFPageCountError(\n"
  ]

Belval commented 5 months ago

Probably the same issue as #372 but with a different library. Seems like a new version of the lambda environment is numbering their libraries at the name level.

Change would be here: https://github.com/aws-samples/amazon-textract-textractor/blob/master/.github/workflows/lambda_layers.yml#L355

We will address this issue by the end of the day, thank you for flagging it.

Viajante80 commented 5 months ago

Thank you @Belval I tested build 51 and got a new error

Response { "errorMessage": "Unable to get page count.\npdfinfo: error while loading shared libraries: libplc4.so: cannot open shared object file: No such file or directory\n", "errorType": "PDFPageCountError", "requestId": "5626e07d-6d35-4698-a0d9-c01447b43502", "stackTrace": [ " File \"/var/task/lambda_function.py\", line 27, in lambda_handler\n textract = extractor.start_document_analysis(\n", " File \"/opt/python/textractor/textractor.py\", line 575, in start_document_analysis\n images = self._get_document_images_from_path(original_file_source)\n", " File \"/opt/python/textractor/textractor.py\", line 133, in _get_document_images_from_path\n images = convert_from_bytes(bytearray(file_obj))\n", " File \"/opt/python/pdf2image/pdf2image.py\", line 359, in convert_from_bytes\n return convert_from_path(\n", " File \"/opt/python/pdf2image/pdf2image.py\", line 127, in convert_from_path\n page_count = pdfinfo_from_path(\n", " File \"/opt/python/pdf2image/pdf2image.py\", line 611, in pdfinfo_from_path\n raise PDFPageCountError(\n" ] }

Belval commented 5 months ago

This is fixed in the latest lambda layer version.

aws-samples / amazon-textract-textractor

Lambda layers for Python 3.12 PDF raising an exception on missing libpng16.so.16 #373