trp2.get_block_by_id - Githubissues

Shayndee commented 5 months ago

if there is no block returned from self.find_block_by_id, it raises ValueError no block for id and then fails to parse the rest of the page.

athewsey commented 5 months ago

I'm not deep on the Python version of the library, but from what I understand this may be by design... What's your expected behaviour for missing blocks referenced in the response, @Shayndee?

e.g.

Raise a validation error at load/parse time?
Gracefully ignore missing blocks at load/parse time, but raise an error when attempting to access them later?
Gracefully ignore missing blocks altogether, wherever they're referenced?

athewsey commented 3 months ago

Following up on this after diving a bit deeper:

TDocument provides two alternative methods depending on your desired error handling behaviour:

find_block_by_id() returns None when no such block exists (and we explicitly test that functionality)
get_block_by_id() raises a ValueError when no such block exists

From your original description, I understand the issue is that TRP throwing an error when trying to initially load/parse a JSON that references (i.e. somewhere in a block's Relationships) a block ID that does not exist?

I understand (unless @Belval wants to correct me) that this behaviour of throwing an error on loading a document with missing block(s) is by design and ability to nicely handle malformed JSON would be a feature request.

If I'm right, could you help by sharing some extra details on what type of block is missing from your JSON / where it's referenced?
If I'm wrong and you're seeing an actual bug with find_block_by_id itself throwing an error, please let us know!

aws-samples / amazon-textract-response-parser

trp2.get_block_by_id #176