Open mikebranstein opened 6 years ago
PR #82 submitted.
Hi,
I have used the sample and changed the URI to wss://westus.stt.speech.microsoft.com in the speechConnectionFactory.js. I kept getting error 403 Forbidden.
May I know what should be the URI?
Thanks in Advance.
@mageshpurpleslate it depends. If you're using the Bing Speech service, nothing needs to change. If you're going to use the Custom Speech Service, you need to append an endpoint Id to the URI. Check out PR #82 for the details on everything that needs to change.
Mike,
Thanks a lot for your help. Works like a charm. Pretty nicely done.
Is this acceptable to send the API subscription key in the query parameter? Are you planning to do any changes to that?
Regards,
Magesh
@mageshpurpleslate I'm glad this worked for you - I recall starting out with Bing and Custom Speech ~ a year ago and the samples were pretty rough.
There are 2 ways to authenticate to the speech services with WebSockets. The first is using the the query string format. It's acceptable to send it that way because it's over HTTPS (WSS). The second way to authenticate is to pre-authenticate with an HTTP POST to the Cognitive Services secure token service. This returns a bearer token that is added to the WebSocket connection header. Docs on how to do this is here.
@mikebranstein - I tried the custom speech implementation with your proposed code changes. But i am getting "403 Forbidden error" in the WSS call. The path i have copied from F12 dev tools looks like:
wss://westus.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?cid=https://westus.api.cognitive.microsoft.com/sts/v1.0&format=simple&language=en-US&Ocp-Apim-Subscription-Key=<...... key......>&X-ConnectionId=<..... connection id .... >
Is this a valid path formation, have you ever faced 403 error during your testing ?
@mraguraman3 I believe you are placing the entire endpoint URL in the "Custom Speech Endpoint ID" textbox. Instead, use the Endpoint ID, which is a GUID. You can find the endpoint ID on the custom speech portal.
Thanks a lot @mikebranstein , its working now after placing the endpoint ID.
Also you mentioned that we can use token based authentication, so in this case we don't need to pass endpoint ID in HTTP Post header , just the subscription key is enough to generate the token ?
@mraguraman3 - yes, token-based auth is also available. I did not use token-based auth because the original solution used the query string parameter auth. I wanted to augment the solution in a specific way for this PR. A different PR would be necessary to change the auth.
Thanks @mikebranstein .
Anyways i can confirm token based auth is not working with the custom speech implementation.
Though the token is generated using the subscription ID, I am getting 401 Unauthorized when hitting the web sockets. Below is the wss call format:
wss://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=#endpointID#&format=simple&language=en-US&Authorization=#token#
@mraguraman3 token based auth does work, but it's tricky. I have a C# SDK I had to roll for the Custom Speech Service websocket speech protocol before Microsoft released their own.
Great @mikebranstein, but i am using Javascript Node App to generate the token. Anyways, all i want to know is whether this is a valid wss call format or am i missing some thing ?
wss://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=#endpointID#&format=simple&language=en-US&Authorization=#token#
@mraguraman3 the C# SDK was released after //BUILD this year. You can find it on NuGet: https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech/.
Thanks @mikebranstein , but i don't think there is an option in this SDK to provide the endpoint ID for custom speech.
Only EndpointURL is supported which i believe is the actual http host for speech service. Here is the documentation of supported properties in C# sdk:
Do you have any plans to support token auth in "SpeechToText-WebSockets-Javascript" for custom speech ?
@mraguraman3 from what I understand, the EndpointURL property is part of the URL. So, the EndpointURL for custom speech could be wss://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=#endpointID#.
Underneath the SDK the streaming protocol supported is the Speech Service WebSocket protocol, outlined here: https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/websocketprotocol.
If you were going to implement the speech protocol yourself, you'd have to request an auth token using your subscription id (like this code snippet below).
private async Task<string> FetchToken()
{
using (var client = new HttpClient())
{
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "<Subscription Id>");
UriBuilder uriBuilder = new UriBuilder("https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken");
var result = await client.PostAsync(uriBuilder.Uri.AbsoluteUri, null);
return await result.Content.ReadAsStringAsync();
}
}
When you have that token, you can use a ClientWebSocket and set the Authorization bearer token on the web socket connection. Assuming _cws is the client web socket:
var authToken = await FetchToken();
_cws.Options.SetRequestHeader("Authorization", $"Bearer {authToken}");
In review of the JavaScript SDK, it supports auth token connections. The sample HTML does not use it, but you can modify the sample code slightly to take advantage of the auth token approach. See https://github.com/Azure-Samples/SpeechToText-WebSockets-Javascript/blob/477067fe264159e7ccbd233a01015f9ea03a6d06/samples/browser/Sample.html#L166. I believe you can change this value to true
and your solution would use the auth token approach.
Hi @mikebranstein, I am here to check one more item with you. Is there a way, we can save the clip, while it is being sent for recognition as well? We are trying to save it for auditing purposes.
@mageshpurpleslate there's no native SDK way of doing this (to my knowledge), so you'd have to write the code to do this. For example, you could write a middle layer that collects the audio from a microphone, then funnels it to your desired location, then writes the same stream to the Speech SDK. If you don't want to do that client-side with JavaScript, then you could host your own WebSocket app that uses the C# Speech SDK. Your websocket app would act as the middle layer, intercepting the audio stream. I have a solution that does this that is hosted as a Service Fabric Web Socket app in Azure.
@mageshpurpleslate After thinking for a few more minutes, the C# SDK has a custom audio source/stream you can create. You could create one that audits the audio bytes as they are being fed to the service via the SDK.
@mikebranstein thank you. This would help. I will try it out.
wss://westus.stt.speech.microsoft.com is working for latest speech api.
The JavaScript SDK only works with Bing Speech API endpoints. Custom Speech endpoints need to be supported. PR incoming.