aws-samples / amazon-textract-textractor

Analyze documents with Amazon Textract and generate output in multiple formats.
Apache License 2.0
390 stars 142 forks source link

Overlayer broken with DocumentDimension not subscritable #209

Open miluna8 opened 1 year ago

miluna8 commented 1 year ago

TypeError Traceback (most recent call last) Cell In[45], line 12 9 document_dimension:DocumentDimensions = DocumentDimensions(doc_width=image.size[0], doc_height=image.size[1]) 10 overlay=[Textract_Types.WORD, Textract_Types.CELL] ---> 12 bounding_box_list = get_bounding_boxes(textract_json=doc, document_dimensions=document_dimension, overlay_features=overlay)

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/textractoverlayer/t_overlay.py:103, in get_bounding_boxes(textract_json, overlay_features, document_dimensions) 101 page_number: int = 0 102 for page in doc.pages: --> 103 page_dimensions = document_dimensions[page_number] 104 page_number += 1 105 if (Textract_Types.WORD in overlay_features or Textract_Types.LINE in overlay_features):

TypeError: 'DocumentDimensions' object is not subscriptable

jasonchester commented 7 months ago

wrapping the document dimensions in list fixed this for me.

bounding_box_list = get_bounding_boxes(textract_json=textract_json, document_dimensions=[document_dimension], overlay_features=overlay)