KeyError: 'Text' - on documents with tables

Hello,

I have a fairly normal looking document (for which I unfortunately cannot share original file as its a proprietary doc) that textractprettyprinter.t_pretty_print.get_text_from_layout_json fails to parse with KeyError: 'Text'.

We've traced it to the following problem:

The document in question contains a screenshot of a table, that has a selection in one of the cells:

This in turn is suspected to trigger an error at this line:

File "/app/.venv/lib/python3.11/site-packages/textractprettyprinter/t_pretty_print_layout.py", line 111, in _dfs
    cell_text = " ".join([id2block[line_id]['Text'] for line_id in cell_block["Relationships"][0]['Ids']])

If we inspect the root cause (I've added the try-catch to the original source file): error

It appears that the branch of code in _dfs() function that handles tables should add a check for the blocks that cell is referencing that they actually contain Text property (or alternatively use something like .get('Text',''))

aws-samples / amazon-textract-textractor

KeyError: 'Text' - on documents with tables #343