[Speech Services - Speech Analytics] API Review

azure-sdk commented 6 months ago

New API Review meeting has been requested.

Service Name: Speech Services - Speech Analytics Review Created By: Nate Ko Review Date: 04/02/2024 01:00 PM PT Release Plan: PR: https://github.com/Azure/azure-rest-api-specs/pull/28521 Hero Scenarios Link: Not Provided Core Concepts Doc Link: here

Description: The Ingestion Service is a new feature of Speech Services. Customers can use the service to register a storage account to enable automatic processing of files when new files are added to their blob storage account. The processing currently includes transcription of the audio files and post analytics call via webhook (eg. PromptFlow online endpoint).

The service adds /registrations API where customer configures information about their storage account, transcription behavior and webhook endpoint for post analytics. For authentication, the API supports cognitive services key and token. Prior to registering, customer needs to enable MI on the cognitive services resource and assign the roles (Storage Blob Data Contributor, Cognitive Services User, AzureML Data Scientist) for the service to access to customer's storage, makes a batch transcription call and call the (PromptFlow) analytics endpoint.

Detailed meeting information and documents provided can be accessed here For more information that will help prepare you for this review, the requirements, and office hours, visit the documentation here

azure-sdk commented 5 months ago

Meeting updated by Nate Ko

Service Name: Speech Services - Speech Analytics Review Created By: Nate Ko Review Date: 04/02/2024 01:00 PM PT Release Plan: 1257 PR: https://github.com/Azure/azure-rest-api-specs/pull/28521 Hero Scenarios Link: Not Provided Core Concepts Doc Link: here

Description: The Ingestion Service is a new feature of Speech Services. Customers can use the service to register a storage account to enable automatic processing of files when new files are added to their blob storage account. The processing currently includes transcription of the audio files and post analytics call via webhook (eg. PromptFlow online endpoint).

The service adds /registrations API where customer configures information about their storage account, transcription behavior and webhook endpoint for post analytics. For authentication, the API supports cognitive services key and token. Prior to registering, customer needs to enable MI on the cognitive services resource and assign the roles (Storage Blob Data Contributor, Cognitive Services User, AzureML Data Scientist) for the service to access to customer's storage, makes a batch transcription call and call the (PromptFlow) analytics endpoint.

Detailed meeting information and documents provided can be accessed here For more information that will help prepare you for this review, the requirements, and office hours, visit the documentation here

mikekistler commented 5 months ago

Notes from API Review 4/2/24

Don't put preview in the URL
Use PATCH (preferred) or PUT for Create
- Also PATCH for Update (version tolerant)
Use only the operation templates from the TypeSpec Azure library
- ResourceCreateOrUpdate for Create and Update
Names for storage properties should be more descriptive
Consider whether to make the storage configuration more generic if other storage providers will be supported in the future.
"polling" might be clearer that "poll"
Might want to model this as a polymorphic resource type on deliveryMode
Speech needs to get to a consistent versioning scheme and version all APIs together
- Is there just one SDK or multiple SDKs for Speech?
- Are the SDKs Track 1 or Track 2 ? Rob Chambers
Need to make this consistent with the rest of the GA service
- Your choice to use TypeSpec for that or just stick with OpenAPI

Recommend come to API Stewardship office hours to work a plan for this.

mhko commented 5 months ago

Your choice to use TypeSpec for that or just stick with OpenAPI

We used OpenAPI spec.

Don't put preview in the URL

Changed to v0.2-preview. This versioning scheme is consistent with rest of Speech Services (batch) (eg. https://eastus.ingestion.speech.microsoft.com/v0.2-preview/registrations)

Use PATCH (preferred) or PUT for Create

Speech Services uses POST for create. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription-create?pivots=rest-api#create-a-transcription-job. We opted for consistency.

Also PATCH for Update (version tolerant)

Speech Services uses PATCH for update (Change pending 5/1/2024)
Use only the operation templates from the TypeSpec Azure library

Per discussion above, we opted for OpenAI Spec
Names for storage properties should be more descriptive

Updated.
Consider whether to make the storage configuration more generic if other storage providers will be supported in the future. "polling" might be clearer that "poll" Might want to model this as a polymorphic resource type on deliveryMode

Updated to

"trigger": {
"kind": "EventGrid|Polling"
"filter": null,
"systemTopicResourceId": "/subscriptions/2c2e6d10-4e48-40fd-8f4d-d9fb770d0c6d/resourceGroups/speechingestiontest/providers/Microsoft.EventGrid/systemTopics/systemtopicbyos"
},

Speech needs to get to a consistent versioning scheme and version all APIs together

We opted for versioning scheme consistent with Batch Transcription Service (GA)

Is there just one SDK or multiple SDKs for Speech?

AFAIK, there's handcrafted (Carbon - Rob Chambers) and generated (azure-rest-api-specs\specification\cognitiveservices\data-plane\Speech\BatchTextToSpeech) for Batch Transcription but I haven't seen the latter. Started an email thread with Oliver who owns Speech Services all up.

Are the SDKs Track 1 or Track 2 ?

Rob Chambers

Need to make this consistent with the rest of the GA service

Choices are made based on consistency.

Pull Request https://github.com/Azure/azure-rest-api-specs/pull/28888

mikekistler commented 4 months ago

Notes from follow-up review meeting w/ Mike Kistler & Jeff Richter:

The changes to address comments from the prior review look fine, but there needs to be uniform versioning across the service to comply with Azure's Versioning Policy. We'll let this go in this PR but should be fixed as soon as possible and definitely before any new GA.

azure-sdk commented 4 months ago

Meeting updated by Nate Ko

Service Name: Speech Services - Speech Analytics Review Created By: Nate Ko Review Date: 04/02/2024 01:00 PM PT Release Plan: 1257 PR: https://github.com/Azure/azure-rest-api-specs/pull/28888 Hero Scenarios Link: Not Provided Core Concepts Doc Link: here

Description: The Ingestion Service is a new feature of Speech Services. Customers can use the service to register a storage account to enable automatic processing of files when new files are added to their blob storage account. The processing currently includes transcription of the audio files and post analytics call via webhook (eg. PromptFlow online endpoint).

The service adds /registrations API where customer configures information about their storage account, transcription behavior and webhook endpoint for post analytics. For authentication, the API supports cognitive services key and token. Prior to registering, customer needs to enable MI on the cognitive services resource and assign the roles (Storage Blob Data Contributor, Cognitive Services User, AzureML Data Scientist) for the service to access to customer's storage, makes a batch transcription call and call the (PromptFlow) analytics endpoint.

Detailed meeting information and documents provided can be accessed here For more information that will help prepare you for this review, the requirements, and office hours, visit the documentation here

Azure / azure-rest-api-specs

[Speech Services - Speech Analytics] API Review #28522