Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.26k stars 4.61k forks source link

Resolve concerns regarding adding all OCR pages to Supervised Forms/Receipts #10744

Closed annelo-msft closed 4 years ago

annelo-msft commented 4 years ago

Was: Resolve concerns regarding adding all OCR pages to Receipt

A single page doesn't necessarily map to a documentResult. E.g. a blank page might not be part of any document result, but we should still return the metadata for that page.

Python will do this: any method that provides a documentResult should repeat all pages of the readResult/pageResult on each document.

For .NET we should do our best to include only pages relevant to the document. Need to follow up with service team on this.

See: https://github.com/Azure/azure-sdk-for-net/pull/10562#discussion_r395404208

annelo-msft commented 4 years ago

@NHaiby and @cha-zhang confirmed over email that blank pages will be included in a document's page range. In this case, including the pages included in the page range should suffice. Supervised does not currently support multiple forms in a document.

@NHaiby shared the following example:

Analyze response for supervised is for a single document where a document can have multiple pages. The labeled values in the documentResults section can be from certain pages and the readResults section and pageResults section can include additional pages from the document. For example a PDF document has 4 pages, the customer labeled values in page 1, 2 and 4. The output will be as following:

  • readResults includes the text extracted from all pages 1 to 4
  • pageResults include the tables extracted from the document lets say there is a table on page 3
  • documentResults include the values the customer wants to extract from pages 1,2 and 4. All of the above data is needed by the customer for example he can take the table from page 3 as part of his structured results, use the values as part of the structured results and put them in his ERP and use the readResults text results to overlay this and enable the human validation person to verify the results and correct if needed.

In the new design, these values will all be available on RecognizedForm as follows:

RecognizedForm.Fields <-- fields by name from pages 1, 2, and 4, from documentResults RecognizedForm.Pages.Tables <-- tables from page 3, from pageResults RecognizedForm.Pages.Lines <-- text extracted from pages 1-4, from readResults