Azure-Samples / SpeechToText-WebSockets-Javascript

SDK & Sample to do speech recognition using websockets in Javascript
MIT License
216 stars 151 forks source link

Sample no longer works with Custom Speech service after //BUILD 2018 product updates #81

Open mikebranstein opened 6 years ago

mikebranstein commented 6 years ago

The JavaScript SDK only works with Bing Speech API endpoints. Custom Speech endpoints need to be supported. PR incoming.

mikebranstein commented 6 years ago

PR #82 submitted.

mageshpurpleslate commented 6 years ago

Hi,

I have used the sample and changed the URI to wss://westus.stt.speech.microsoft.com in the speechConnectionFactory.js. I kept getting error 403 Forbidden.

May I know what should be the URI?

Thanks in Advance.

mikebranstein commented 6 years ago

@mageshpurpleslate it depends. If you're using the Bing Speech service, nothing needs to change. If you're going to use the Custom Speech Service, you need to append an endpoint Id to the URI. Check out PR #82 for the details on everything that needs to change.

mageshpurpleslate commented 6 years ago

Mike,

Thanks a lot for your help. Works like a charm. Pretty nicely done.

Is this acceptable to send the API subscription key in the query parameter? Are you planning to do any changes to that?

Regards,

Magesh

mikebranstein commented 6 years ago

@mageshpurpleslate I'm glad this worked for you - I recall starting out with Bing and Custom Speech ~ a year ago and the samples were pretty rough.

There are 2 ways to authenticate to the speech services with WebSockets. The first is using the the query string format. It's acceptable to send it that way because it's over HTTPS (WSS). The second way to authenticate is to pre-authenticate with an HTTP POST to the Cognitive Services secure token service. This returns a bearer token that is added to the WebSocket connection header. Docs on how to do this is here.

mraguraman3 commented 6 years ago

@mikebranstein - I tried the custom speech implementation with your proposed code changes. But i am getting "403 Forbidden error" in the WSS call. The path i have copied from F12 dev tools looks like:

wss://westus.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?cid=https://westus.api.cognitive.microsoft.com/sts/v1.0&format=simple&language=en-US&Ocp-Apim-Subscription-Key=<...... key......>&X-ConnectionId=<..... connection id .... >

Is this a valid path formation, have you ever faced 403 error during your testing ?

mikebranstein commented 6 years ago

@mraguraman3 I believe you are placing the entire endpoint URL in the "Custom Speech Endpoint ID" textbox. Instead, use the Endpoint ID, which is a GUID. You can find the endpoint ID on the custom speech portal.

mraguraman3 commented 6 years ago

Thanks a lot @mikebranstein , its working now after placing the endpoint ID.

Also you mentioned that we can use token based authentication, so in this case we don't need to pass endpoint ID in HTTP Post header , just the subscription key is enough to generate the token ?

mikebranstein commented 6 years ago

@mraguraman3 - yes, token-based auth is also available. I did not use token-based auth because the original solution used the query string parameter auth. I wanted to augment the solution in a specific way for this PR. A different PR would be necessary to change the auth.

mraguraman3 commented 6 years ago

Thanks @mikebranstein .

Anyways i can confirm token based auth is not working with the custom speech implementation.

Though the token is generated using the subscription ID, I am getting 401 Unauthorized when hitting the web sockets. Below is the wss call format:

wss://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=#endpointID#&format=simple&language=en-US&Authorization=#token#

mikebranstein commented 6 years ago

@mraguraman3 token based auth does work, but it's tricky. I have a C# SDK I had to roll for the Custom Speech Service websocket speech protocol before Microsoft released their own.

mraguraman3 commented 6 years ago

Great @mikebranstein, but i am using Javascript Node App to generate the token. Anyways, all i want to know is whether this is a valid wss call format or am i missing some thing ?

wss://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=#endpointID#&format=simple&language=en-US&Authorization=#token#

mikebranstein commented 6 years ago

@mraguraman3 the C# SDK was released after //BUILD this year. You can find it on NuGet: https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech/.

mraguraman3 commented 6 years ago

Thanks @mikebranstein , but i don't think there is an option in this SDK to provide the endpoint ID for custom speech.

Only EndpointURL is supported which i believe is the actual http host for speech service. Here is the documentation of supported properties in C# sdk:

https://docs.microsoft.com/en-gb/dotnet/api/microsoft.cognitiveservices.speech.speechfactory?view=azure-dotnet

Do you have any plans to support token auth in "SpeechToText-WebSockets-Javascript" for custom speech ?

mikebranstein commented 6 years ago

@mraguraman3 from what I understand, the EndpointURL property is part of the URL. So, the EndpointURL for custom speech could be wss://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=#endpointID#.

Underneath the SDK the streaming protocol supported is the Speech Service WebSocket protocol, outlined here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/websocketprotocol.

If you were going to implement the speech protocol yourself, you'd have to request an auth token using your subscription id (like this code snippet below).

 private async Task<string> FetchToken()
{
  using (var client = new HttpClient())
  {
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "<Subscription Id>");
    UriBuilder uriBuilder = new UriBuilder("https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken");

    var result = await client.PostAsync(uriBuilder.Uri.AbsoluteUri, null);
    return await result.Content.ReadAsStringAsync();
  }
}

When you have that token, you can use a ClientWebSocket and set the Authorization bearer token on the web socket connection. Assuming _cws is the client web socket:

var authToken = await FetchToken();
_cws.Options.SetRequestHeader("Authorization", $"Bearer {authToken}");

In review of the JavaScript SDK, it supports auth token connections. The sample HTML does not use it, but you can modify the sample code slightly to take advantage of the auth token approach. See https://github.com/Azure-Samples/SpeechToText-WebSockets-Javascript/blob/477067fe264159e7ccbd233a01015f9ea03a6d06/samples/browser/Sample.html#L166. I believe you can change this value to true and your solution would use the auth token approach.

mageshpurpleslate commented 6 years ago

Hi @mikebranstein, I am here to check one more item with you. Is there a way, we can save the clip, while it is being sent for recognition as well? We are trying to save it for auditing purposes.

mikebranstein commented 6 years ago

@mageshpurpleslate there's no native SDK way of doing this (to my knowledge), so you'd have to write the code to do this. For example, you could write a middle layer that collects the audio from a microphone, then funnels it to your desired location, then writes the same stream to the Speech SDK. If you don't want to do that client-side with JavaScript, then you could host your own WebSocket app that uses the C# Speech SDK. Your websocket app would act as the middle layer, intercepting the audio stream. I have a solution that does this that is hosted as a Service Fabric Web Socket app in Azure.

mikebranstein commented 6 years ago

@mageshpurpleslate After thinking for a few more minutes, the C# SDK has a custom audio source/stream you can create. You could create one that audits the audio bytes as they are being fed to the service via the SDK.

mageshpurpleslate commented 6 years ago

@mikebranstein thank you. This would help. I will try it out.

hellowonders commented 5 years ago

wss://westus.stt.speech.microsoft.com is working for latest speech api.