Board Review: FormRecognizer

ctstone commented 4 years ago

The Basics

Service team responsible for the client library: Form Recognizer Dev Team
Link to documentation describing the service: https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/overview
Contact email: PM: netahw; Dev Lead: rparab

About this client library

Name of the client library: Form Recognizer (Azure.AI.FormRecognizer)
Languages for this review: .NET
Link to the service REST APIs:
- Swagger, source
- Swagger, UI Friendly

Artifacts required (per language)

We use an API review tool (apiview) to support .NET and Java API reviews. For Python and TypeScript, use the API extractor tool, then submit the output as a Draft PR to the relevant repository (azure-sdk-for-python or azure-sdk-for-js).

TypeScript

Not started

Champion Scenarios

A champion scenario is a use case that the consumer of the client library is commonly expected to perform. Champion scenarios are used to ensure the developer experience is exemplary for the common cases. You need to show the entire code sample (including error handling, as an example) for the champion scenarios.

Champion Scenario 1:
- Train a custom model: User uploads form content (PDF, JPEG, TIFF, or PNG) to a blob container, generates a read-only SAS URL for that container, and passes it to the sdk to begin training. Uploading content and generating SAS URL are not in scope for the SDK and are expected to be performed by the user using Storage Explorer or a custom script.
- % of users: %50
- Sample code (WIP): https://github.com/ctstone/azure-sdk-for-net/blob/form-recognizer-v2.0-GA/sdk/formrecognizer/samples/Program.cs#L309-L330
Champion Scenario 2:
- Analyze form using a custom model: User sends a new form document (either local Stream or remote Uri) to be analyzed using a trained model. SDK detect the correct content-type based on the file content, or user may specify a known value.
- % of users: 100%
- Sample code for local file (WIP): https://github.com/ctstone/azure-sdk-for-net/blob/form-recognizer-v2.0-GA/sdk/formrecognizer/samples/Program.cs#L247-L264
- Sample code for Uri (WIP): https://github.com/ctstone/azure-sdk-for-net/blob/form-recognizer-v2.0-GA/sdk/formrecognizer/samples/Program.cs#L266-L282

The prebuilt Analyze operations share the same API surface as custom Analyze.

Agenda for the review

A board review is generally split into two parts, with additional meetings as required

Part 1 - Introducing the board to the service:

Review of the service (no more than 10 minutes).
Review of the champion scenarios.
Get feedback on the API patterns used in the champion scenarios.

After part 1, you may schedule additional meetings with architects to refine the API and work on implementation.

Part 2 - the "GA" meeting

Scheduled at least one week after the APIs have been uploaded for review.
Will go over controversial feedback from the line-by-line API review.
Exit meeting with concrete changes necessary to meet quality bar.

Thank you for your submission

adrianhall commented 4 years ago

Scheduled for 1/30/2020

adrianhall commented 4 years ago

Notes from API Review Board:

Link to recording (MSFT INTERNAL): https://msit.microsoftstream.com/video/c5163ca9-9adf-4674-a738-dd1d04ffe5ac

Initial Notes:

Consider splitting classes into multiple namespaces, between "scenario" clients vs. "general purpose" client. In the scenario (pre-built) clients, the models are "real-world" and in the custom clients, the models are more ML.
If the source of the training data is a Blob, should we use the Blob Storage API?
- The training service can be run in a (docker) container, so it can be a path URI
- Open Issue: Can this be a BlobSource / FileSource - ultimately it's just a string.
- It's a little problematic because we are trying to teach people to use Storage ContainerClient
What happens if the Storage Uri doesn't have a SAS token? Do you use the identity that is doing the request?
- It needs to be GETable without identity.
- Would be good to work with DefaultAzureCredentials() - maybe on-behalf-of
There is an issue with Azure Core long running operations - when we wait on the operation, we can't tell what the response is - should we throw? There are ambiguities.
- @KrzysztofCwalina will work offline on this.
Why are there two types - FormRecognizer / GetModelReference?
- Discoverability - reuse.
Will they be using one model or multiple models?
- It sounds like "one model" is the norm, but maybe unsupervised is a little more complex?
What operations are happening off the FormRecognizerClient?
- It will look like the swagger - each first level segment will mirror on the client.
Today, majority will be training - we expect that to continue
There are three mechanisms to fix the client/model ref. What's the mental model?
1. analyze takes the modelId
2. get model ref
3. new client with a modelId as an arg.
@KrzysztofCwalina is skeptical that the API is stable over even the short term. Service team agrees.
The model versions are not compatible between API service version.
Service team wants to retain the ability to version the schema of the model over time.
- Need to discuss library versions v.v. service API version
- Seeing the 1-pager "best practices for resilient apps" would be good.
Can you do the evolution of the API through additions? New models, new methods, new types?
Can we have a version number in the package name? NO!
We need a business value assigned to major version changes.
Should the TOKENS be in a separate collection?
- Good observation - we don't have the right people in the room.
- We might want to be Label-Value instead of Key-Value
- How close do we want to be to REST API?
  - No agreement, but developers should not be confused between REST, Portal, and SDK for terminology. Parity between REST & SDK is not important, but parity in concepts is important.
We can't use the same FormRecognizerClient - static analyzer throws it out.
- Suppress the warnings.
Streams would need to be seekable to support retries.
There is an issue with providing a URL that is passed to the server - reachability concerns.
- Also permissions
- Is there a pattern that makes it a lot clearer to the developer?
Recommendation: FormReceiptClient => ReceiptRecognizerClient
Recommendation: Remove polymorphism / AnalyzeClient - it's only beneficial to us and has issues with intellisense.
Recommendation: StartAnalysisAsync() => StartRecognitionAsync()
- Why is it a long running operation? It can be multiple seconds. P99 > 500ms (Azure REST API guidelines)
- Look at changing the API surface to not use long-running operations. it's ok to hide the long running operation behind a standard operation.
Should we have separate models for receipt, passport, etc. YES!

Maybe do an experiment where we release separate libraries (business level vs. lower-level)

Main customer is PowerApps.
Whatever we do should be usable by PowerApps

[ ] TODO: @annelo-msft and @KrzysztofCwalina will help design higher-level API (split or not) [ ] TODO: We need to get on the same page for versioning of the library. [ ] TODO: Align guidance for long running operations to operations, aligned with REST API guidelines

annelo-msft commented 4 years ago

The Basics

Service team responsible for the client library: Azure SDK
Link to documentation describing the service: https://azure.microsoft.com/services/cognitive-services/form-recognizer/
Contact email: @mayurid, @annelo-msft

About this client library

Name of the client library: formrecognizer
Languages for this review: .NET, Python
Link to the service REST APIs: https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview

Artifacts required (per language)

We use an API review tool (apiview) to support .NET and Java API reviews. For Python and TypeScript, use the API extractor tool, then submit the output as a Draft PR to the relevant repository (azure-sdk-for-python or azure-sdk-for-js).

.NET

Upload DLL to apiview. Link: https://apiview.dev/Assemblies/Review/cf2c57b3d4c44ef08c2e8284f7441a2d
Link to samples for champion scenarios: https://github.com/Azure/azure-sdk-for-net/tree/master/sdk/formrecognizer/Azure.AI.FormRecognizer/samples

Python

Upload the api as a Draft PR. Link to PR: https://apiview.dev/Assemblies/Review/36ac7c8c633d4bf6a45ee8672243f195
Link to samples for champion scenarios: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer/samples

Champion Scenarios

A champion scenario is a use case that the consumer of the client library is commonly expected to perform. Champion scenarios are used to ensure the developer experience is exemplary for the common cases. You need to show the entire code sample (including error handling, as an example) for the champion scenarios.

Recognize and extract field values from custom forms

Target audience: LOB developers (~90%)
Code samples: .NET | Python

Recognize and extract field values from US Receipts

Target audience: Day 0 developers, LOB developers (~30%)
Code samples: .NET | Python

Recognize and extract OCR Content and Tables from Forms

Target audience: LOB developers (~20%)
Code samples: .NET | Python

Train a model from custom forms

Target audience: Data Scientists (~50%)
Code samples: .NET | Python

Manage custom models

Target audience: Data Scientists (~50%)
Code samples: .NET | Python

Agenda for the review

A board review is generally split into two parts, with additional meetings as required

Part 1 - Introducing the board to the service:

Review of the service (no more than 10 minutes).
Review of the champion scenarios.
Get feedback on the API patterns used in the champion scenarios.

After part 1, you may schedule additional meetings with architects to refine the API and work on implementation.

Part 2 - the "GA" meeting

Scheduled at least one week after the APIs have been uploaded for review.
Will go over controversial feedback from the line-by-line API review.
Exit meeting with concrete changes necessary to meet quality bar.

Thank you for your submission

kyle-patterson commented 4 years ago

Scheduled for 4/30

kyle-patterson commented 4 years ago

Link to recording (MSFT INTERNAL): https://msit.microsoftstream.com/video/dd4ba1ff-0400-96d1-ebd7-f1ea8fed4db1

.NET Data Samples

Q: Why is form field and form page on the same level?

User studies show that users would prefer to focus on the fields separate from the pages

Q: Is there a risk of a large number of extension methods to identify receipt locales?

Yes, that's a risk. For areas with many different types of forms that are logically related, we could provide different packages containing those locales.
For now, US locale is a known supported area, so included in main package.
Concern about doing this in languages without extension methods.

Q: Is ReceiptLocale the best first name/concept, if there are different structures of receipts within locales? Could this be generalized outside of locale?

From the service team: the receipts being collected from providers don't have a locale issue

FOLLOW-UP: Consider how new receipt types with strongly-typed fields would be added to the API

Q: Could useTrainingLabels be made a required field, rather than optional with default:false?

FOLLOW-UP with service time on what is consistent with REST, expected customer behavior, etc

Concern: Having two clients in the Client->TrainingClient is a strange way of addressing discoverability problem of training. Could the direction be flipped such that the training client flows into a client?

Discoverability could be solved with documentation RECOMMENDATION: Investigate moving training client to point to recognition client, and then investigate discoverability, based on customer feedback of the preview. Address this for preview-3 to determine GA requirements.
Consistency across languages is important to consider in this, too

Management Operations

Q: For management APIs "LastModified" is actually "LastUpdated" in the swagger

This was trying to be consistent with other SDKs FOLLOW-UP: Investigate the correct nomenclature

Copy operations

Q: A scenario is two different entities with no keys to resource/subscription/credentials of the other entities. How is copy covered out-of-band in this case?

FOLLOW-UP to consider this case. There can also be a discussion about convenience APIs as currently designed

ApiView

Q: Is there a model versioning, ie can clientv2 call modelv3?

This isn't supported in the service currently; forward- and backward-compatibility is unsupported, as the service response may be different. Cross-version models aren't even visible, you'd get a 404 error.
This means REST breaks between service versions
FOLLOW-UP on REST plans and conforming with SDK/REST guidelines to understand this problem scope and how/when to address

FOLLOW-UP on incorporating classification of recognized model into the model/submodel name

Python Data Samples

Overall, largely similar in both usage and concerns as .NET review.

Differentiation: No AsUSReceipt extension; instead, determined by receiptlocale that's passed in to the client functions

Q: How much variation is expected in a receipt based on locale?

Quite a lot. The output will remain the same, but the contents of the receipt may vary

Q: Is there a scenario where the customer doesn't know the locale of the receipt?

In theory it's possible, but the service doesn't yet support this

NOTE: In samples, include retrieving the confidence

Azure / azure-sdk