aws-samples / amazon-textract-code-samples

Amazon Textract Code Samples
MIT No Attribution
406 stars 263 forks source link

Broken blocks relations #16

Closed Eitol closed 2 years ago

Eitol commented 2 years ago

When getting more blocks than the "MaxResults" parameter, the parsing of the document fails.

image

image

Therefore the parse must be able to support broken references.

The failure is reproduced in the following repository:

https://github.com/Eitol/test_problematic_file

schadem commented 2 years ago

The Textract JSON response will not include broken references, it is consistent. I have seen this when for an asynchronous call not all the parts of the paginated response are pulled and combined. Take a look at https://github.com/aws-samples/amazon-textract-textractor/tree/master/caller for a higher level call that does that or the sample functions that get the pagininated results.