aws-samples / amazon-textract-serverless-large-scale-document-processing

Process documents at scale using Amazon Textract
Apache License 2.0
331 stars 165 forks source link

Some PDFs not generating any outputs #15

Open neilc85 opened 4 years ago

neilc85 commented 4 years ago

Whilst on the whole this is working really well, we have found that some PDFs that we upload into the textractpipelinestack-documentsbucketxxx bucket do not get processed completely.

They do seem to still be being processed by textract as the charges are still being applied to our account, but the outputs do not load into the bucket as expected within the code.

I cannot seem to find any error messages suggesting that it isn't working.

I can provide examples of documents if needed. We have a number where they work completely fine, but others simply don't output.

I have raised a Support ticket as well, but they suggested that I raise an issue on here as well.

We want to use textract quite heavily when we move into product build / production, but we are concerned that some documents aren't being picked up.

melbit-tomcuddihy commented 4 years ago

Hi @neilc85, did you end up figuring out what the problem was here?