Lambda function takes the document to Textract to extract the text

bahrain-uob / moe-questions-bank

This repository will include the MOE Questions Bank challenge artifacts

1 stars 0 forks source link

Lambda function takes the document to Textract to extract the text #7

Open Exortix opened 1 week ago

Exortix commented 1 week ago

Textract lambda function does not return the whole text from the file

Exortix commented 1 week ago

Add lambda function: extract-text

Mahmood-Alalwan commented 1 week ago

fixed issue with lambda code for extracting files, but used a different S3 and lambda "myfunctionMA" is the working textract lambda function now.

fedaabd commented 5 days ago

what is the conclusion; the lambda is working, but is this enough for the solution, or do we need to investigate further steps for the coming spring? Textract is an option; but we need to invest in arranging the format of the document or another service or approach.

Mahmood-Alalwan commented 5 days ago

lambda for textarct is fully functional and successfully stores the txt files in "extracted files" folder in the same s3 bucket. for now this is not enough for our solution and further arranging or changing the format of the pdf files might be needed.