aws-samples / amazon-textract-response-parser

Parse JSON response of Amazon Textract
Apache License 2.0
220 stars 96 forks source link

(src-js) `ApiRelationshipType` is missing `TABLE_TITLE` #171

Closed rob3c closed 8 months ago

rob3c commented 10 months ago

It's unclear whether this library is intentionally restricting the types exposed by the Textract API responses or not, but the typescript types have a number of omissions that prevent us from accessing certain properties and values without coding around them (e.g. my previous issue about the missing Page property on blocks.)

In this case, ApiRelationshipType is missing TABLE_TITLE, which can even be seen in the JSON test responses used in this project's test suite.

The types in this library are often more useful than those in @aws-sdk/client-textract. For example, they're not all declared with optional properties when the API always includes certain values, properties are omitted entirely depending on BlockType, etc.

Since Textract is often confused and doesn't parse e.g. tables correctly for certain layouts, I need to write custom code to find the missing data elsewhere in the results to fix things. I'd like to use the types in this library with that code whenever possible for simplicity.

However, if the types here only represent the subset of the Textract API model surface area that is used internally by this library, then I won't bother the maintainers with more issues like this. Just let me know.

Thanks in advance!

athewsey commented 9 months ago

Hi @rob3c - thanks for your feedback on the library and calling this issue out!

Previously, TRP.js was not properly pulling through links from tables to TABLE_FOOTER and TABLE_TITLE blocks... So this was definitely a bug.

I believe it should be fixed in the pre-release 0.4.0-alpha.3 now available on NPM - if there's any chance you'd have time to test it out?


To your question about feedback like this in general:

athewsey commented 8 months ago

amazon-textract-response-parser v0.4.0 is now released and we believe should address this issue (and generally fix/implement TABLE_TITLE & TABLE_FOOTER blocks properly).

Thanks for the feedback, and please do re-open if you're still seeing the original problem!