aws-samples / amazon-textract-response-parser

Parse JSON response of Amazon Textract
Apache License 2.0
218 stars 95 forks source link

trp2 document.lines are out of order #142

Closed schadem closed 1 year ago

schadem commented 1 year ago

Using

    p = os.path.dirname(os.path.realpath(__file__))
    f = open(os.path.join(p, "data/little_women_page_1.json"))
    j = json.load(f)
    t_document: t2.TDocument = t2.TDocumentSchema().load(j)    # type: ignore
    page = t_document.pages[0]
    assert "The Project Gutenberg EBook of Little Women, by Louisa M. Alcott" == t_document.lines(page=page)[0].text
    assert "This eBook is for the use of anyone anywhere at no cost and with" == t_document.lines(page=page)[1].text

would fail