aws-samples / amazon-textract-serverless-large-scale-document-processing

Process documents at scale using Amazon Textract
Apache License 2.0
328 stars 165 forks source link

How to get particular page Form-data. #41

Open SharathTF opened 2 years ago

SharathTF commented 2 years ago

I Have added the entire PDF file to S3Bucket via amazons3Client.PutObjectAsync method and started the analysis by textractClient.StartDocumentTextDetectionAsync method. while getting the response by textractClient.GetDocumentTextDetectionAsync method I can see all pages data and could able to segregate the Raw/Line data. My problem is, how can I get the FormData and TableData for a particular page(say I need FormData only for page no 3). Kindly advise on this.

tb102122 commented 2 years ago

You can use the helper for python or you can convert the json to a dict and filter page on the attributes.

https://github.com/aws-samples/amazon-textract-response-parser/blob/master/src-python/README.md