Closed aarif1996 closed 1 year ago
I have a similar issue. If I get a straight answer, I do have coordinates. E.g. : What is the title of this doc? page1 However, if I get 'interpreted' answers e.g. What are the standards of this doc, page1: I have geometry set on None
query is TBlock(geometry=None, id='d1a1bac6-8c00-4b8b-91ef-72ff7d3398d9', block_type='QUERY', relationships=[TRelationship(type='ANSWER', ids=['d3c0611d-a7ba-48ed-9d4a-031e64a3d4f3'])], confidence=None, text=None, column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=TQuery(text='what are the standards of the certified weight?', alias='tc_certified_shipping_standards'))
rels is TRelationship(type='ANSWER', ids=['d3c0611d-a7ba-48ed-9d4a-031e64a3d4f3']) [TBlock(geometry=None, id='d3c0611d-a7ba-48ed-9d4a-031e64a3d4f3', block_type='QUERY_RESULT', relationships=None, confidence=43.0, text='GRS, GRS', column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=None)]
I have a quite big chunk of code depending on coordinates and for 5 months straight, I had no issue. I did check for having same other libraries related to Textract to the old version and tested on old git branches.
So, is this a new way Textract answers to questions?
@aarif1996 Your issue is with the textractor package, not the amazon-textract-response-parser.
@anyaovi : Does your text 'GRS, GRS' exist on the page or is it inferred? Queries may not include the coordinates when the text is inferred. You do not get an exception, correct?
I will close this one, aws-samples/amazon-textract-textractor#195 is the ticket for the KeyError: 'Geometry'
Traceback (most recent call last): File "/home/ubuntu/sample.py", line 52, in textract_output document = extractor.analyze_document( File "/usr/local/lib/python3.8/site-packages/textractor/textractor.py", line 438, in analyze_document document = response_parser.parse(response) File "/usr/local/lib/python3.8/site-packages/textractor/parsers/response_parser.py", line 906, in parse return parse_document_api_response(response) File "/usr/local/lib/python3.8/site-packages/textractor/parsers/response_parser.py", line 770, in parse_document_api_response queries = _create_query_objects( File "/usr/local/lib/python3.8/site-packages/textractor/parsers/response_parser.py", line 381, in _create_query_objects query_results = _create_query_result_objects( File "/usr/local/lib/python3.8/site-packages/textractor/parsers/response_parser.py", line 419, in _create_query_result_objects block["Geometry"]["BoundingBox"], spatial_object=page KeyError: 'Geometry'