Closed arsher-b closed 5 months ago
Verified I can reproduce this - the document works in the Textract console and the API response contains a non-empty ExpenseDocuments
.
The problem appears to be this line (with a FIXME
), which specifically ignores any ExpenseDocuments
entry with no SummaryFields
: The example doc contains no summary fields, only line items.
Presumably, this is because Textractor needs to determine the page number of the expense document immediately after (even though the Textract response API structure does not specifically tie an ExpenseDocument to one Page, and appears to be designed to support multi-page invoices?). Currently it does this by looking for a PageNumber
annotation on the first summary field.
As a minimally-impacting fix, I think it should also be possible to try fetching this information via:
textract_json["ExpenseDocuments"][0]["LineItemGroups"][0]["LineItems"][0]["LineItemExpenseFields"][0]["PageNumber"]
...But may also be worth considering whether Textractor should continue framing ExpenseDocuments as within a single page, rather than potentially spanning multiple pages?
Hi @arsher-b, we believe this should now be solved with the v1.8.0 release. It'd be great if you could help confirm?
Hello Sir @athewsey, I've confirmed that the issue is resolved with the v1.8.0 release. Thanks!
The analyze_expense return 0 on
expense_documents
even the image is a receipt. Here's the receipt used:Output:
Can someone fix this where it still parse the receipt as line items inside the
Expense Documents
instead ofLines
?