Closed Eitol closed 2 years ago
The Textract JSON response will not include broken references, it is consistent. I have seen this when for an asynchronous call not all the parts of the paginated response are pulled and combined. Take a look at https://github.com/aws-samples/amazon-textract-textractor/tree/master/caller for a higher level call that does that or the sample functions that get the pagininated results.
When getting more blocks than the "MaxResults" parameter, the parsing of the document fails.
Therefore the parse must be able to support broken references.
The failure is reproduced in the following repository:
https://github.com/Eitol/test_problematic_file