aws-samples / amazon-textract-serverless-large-scale-document-processing

Process documents at scale using Amazon Textract
Apache License 2.0
328 stars 165 forks source link

Files with more than 200 pages are not completely extracted #22

Open anjankumarv opened 3 years ago

anjankumarv commented 3 years ago

I noticed a consistent problem with larger files(200+ pages) PDF files with more than 200 pages are never completed extracted. The files either remain unprocessed(no analysis folder created) or just partially extracted(40-50 pages) even after 1-2 days. Files within 100 pages are extracted within 1-5 mins. The process is not bombarded with many large files, the problem is same even if I upload one large file(200 pages) in day. Please let me know if I am missing something for larger files. Thank you