Closed Viajante80 closed 4 months ago
Probably the same issue as #372 but with a different library. Seems like a new version of the lambda environment is numbering their libraries at the name level.
Change would be here: https://github.com/aws-samples/amazon-textract-textractor/blob/master/.github/workflows/lambda_layers.yml#L355
We will address this issue by the end of the day, thank you for flagging it.
Thank you @Belval I tested build 51 and got a new error
Response { "errorMessage": "Unable to get page count.\npdfinfo: error while loading shared libraries: libplc4.so: cannot open shared object file: No such file or directory\n", "errorType": "PDFPageCountError", "requestId": "5626e07d-6d35-4698-a0d9-c01447b43502", "stackTrace": [ " File \"/var/task/lambda_function.py\", line 27, in lambda_handler\n textract = extractor.start_document_analysis(\n", " File \"/opt/python/textractor/textractor.py\", line 575, in start_document_analysis\n images = self._get_document_images_from_path(original_file_source)\n", " File \"/opt/python/textractor/textractor.py\", line 133, in _get_document_images_from_path\n images = convert_from_bytes(bytearray(file_obj))\n", " File \"/opt/python/pdf2image/pdf2image.py\", line 359, in convert_from_bytes\n return convert_from_path(\n", " File \"/opt/python/pdf2image/pdf2image.py\", line 127, in convert_from_path\n page_count = pdfinfo_from_path(\n", " File \"/opt/python/pdf2image/pdf2image.py\", line 611, in pdfinfo_from_path\n raise PDFPageCountError(\n" ] }
This is fixed in the latest lambda layer version.
lambda-layers 50 https://github.com/aws-samples/amazon-textract-textractor/actions/runs/9550648081 artifacts - textractor-lambda-p312-pdf