aws-samples / amazon-textract-response-parser

Parse JSON response of Amazon Textract
Apache License 2.0
218 stars 95 forks source link

How to get row details which aren't part of table from the response #161

Closed shreyasroshan closed 6 months ago

shreyasroshan commented 1 year ago

From the response I was able to visualise the table data using the parser, but how to obtain the non tabular data from response

athewsey commented 11 months ago

Hi @shreyasroshan, the answer to this will depend a bit on what language you're using TRP with (Python? JavaScript?) - but either way it should be possible.

In Python, you should be able to loop through page.form.fields for key-value form fields or page.lines for plain text lines - as mentioned here in the Python readme.

In JavaScript/TypeScript, you should similarly be able to loop through plain text lines, form fields, or query results as outlined in the doc.

It sounds from the title like you might be struggling with some rows of a table not coming through in the tables response? In which case you might need to look at the plain text lines and apply some position-based rules... But not quite sure. If you have any more specific details please do let us know!

athewsey commented 6 months ago

Closing due to inactivity - please feel free to re-open with further details if this is still causing problems