aws-samples / amazon-textract-textractor

Analyze documents with Amazon Textract and generate output in multiple formats.
Apache License 2.0
389 stars 142 forks source link

For textractor.entities.line.Line - visualize() breaks #312

Open h55nick opened 7 months ago

h55nick commented 7 months ago

When trying to visualize "Line" objects I am getting:

106     return EntityList(list(set(new_entity_list))).visualize(
    107         with_text=with_text,
    108         with_words=with_words,
    109         with_confidence=with_confidence,
    110         font_size_ratio=font_size_ratio,
    111     )
    112 elif len(self) > 0 and self[0].bbox.spatial_object.image is None:
--> 113     raise NoImageException(
    114         "Image was not saved during the Textract API call. Set save_image=True when calling the Textractor methods to use the visualize() method."
    115     )
    117 visualized_images = {}
    118 entities_pagewise = defaultdict(list)

NoImageException: Image was not saved during the Textract API call. Set save_image=True when calling the Textractor methods to use the visualize() method.

I can confirm that save_image=True via:

document = extractor.analyze_document(
        save_image=True,
        file_source=image,
        features=FEATURES)

and I can properly visualize Document, KeyValues, Tables etc on the same extraction.

Belval commented 7 months ago

I was not able to reproduce this issue with our internal samples, if you can share the Textract response or original asset necessary to reproduce this issue I can look into it.

If the asset cannot be shared publicly feel free to send it to belvae [AT] amazon.com. Thanks!