Closed annelo-msft closed 4 years ago
@NHaiby and @cha-zhang confirmed over email that blank pages will be included in a document's page range. In this case, including the pages included in the page range should suffice. Supervised does not currently support multiple forms in a document.
@NHaiby shared the following example:
Analyze response for supervised is for a single document where a document can have multiple pages. The labeled values in the documentResults section can be from certain pages and the readResults section and pageResults section can include additional pages from the document. For example a PDF document has 4 pages, the customer labeled values in page 1, 2 and 4. The output will be as following:
- readResults includes the text extracted from all pages 1 to 4
- pageResults include the tables extracted from the document lets say there is a table on page 3
- documentResults include the values the customer wants to extract from pages 1,2 and 4. All of the above data is needed by the customer for example he can take the table from page 3 as part of his structured results, use the values as part of the structured results and put them in his ERP and use the readResults text results to overlay this and enable the human validation person to verify the results and correct if needed.
In the new design, these values will all be available on RecognizedForm
as follows:
RecognizedForm.Fields
<-- fields by name from pages 1, 2, and 4, from documentResults
RecognizedForm.Pages.Tables
<-- tables from page 3, from pageResults
RecognizedForm.Pages.Lines
<-- text extracted from pages 1-4, from readResults
Was: Resolve concerns regarding adding all OCR pages to Receipt
A single page doesn't necessarily map to a documentResult. E.g. a blank page might not be part of any document result, but we should still return the metadata for that page.
Python will do this: any method that provides a documentResult should repeat all pages of the readResult/pageResult on each document.
For .NET we should do our best to include only pages relevant to the document. Need to follow up with service team on this.
See: https://github.com/Azure/azure-sdk-for-net/pull/10562#discussion_r395404208