aws-samples / amazon-textract-response-parser

Parse JSON response of Amazon Textract
Apache License 2.0
218 stars 95 forks source link

`add_page_orientation` does not handle pages without words (e.g. blank pages) (Python) #156

Closed MattExact closed 3 months ago

MattExact commented 1 year ago

add_page_orientation raises an error on documents with blank pages.

If the input data for statistics.mode is empty, StatisticsError is raised (see Python docs)

This could be fixed with something along the lines of:

word_orientiations = [
    round(__get_degree_from_polygon(w.geometry.polygon))
    for w in words
    if w.geometry and w.geometry.polygon
]
orientation = statistics.mode(word_orientiations) if word_orientiations else 0

Or some other alternative 🤷‍♂️

https://github.com/aws-samples/amazon-textract-response-parser/blob/541c07a12d603deed70699357f865d6974369c7b/src-python/trp/t_pipeline.py#L136-L150