Azure-Samples / Cognitive-Speech-STT-Windows

Windows SDK for the Microsoft Speech-to-Text API, part of Cognitive Services
https://www.microsoft.com/cognitive-services/en-us/speech-api
Other
112 stars 89 forks source link

Call WebSocket service for Long-Running in .net core? #31

Closed maptz closed 6 years ago

maptz commented 6 years ago

Hey,

I am trying to call the Websocket version of the service in .Net Core, but notice that there is no support for this library on .Net Core. I feel I should be able to implement the service myself, by making the right request via a ClientWebSocket, but I can't get it to work. I came here hoping for answers, but it seems that the C# library which makes these calls isn't open source? Is there source available for this library?

If not, can you help me? I'm trying to connect to WebSocket implementation of the Bing Speech API because I need to send multiple large files for transcription. It needs to be .net core because the service is running on Linux.

I've tried a number of things along the following lines:

 var cws = new ClientWebSocket();
            cws.Options.SetRequestHeader("X-ConnectionId", Guid.NewGuid().ToString("N"));
            cws.Options.SetRequestHeader("Authorization", "Bearer " + token);
            await cws.ConnectAsync(new Uri(@"wss://speech.platform.bing.com/speech/recognize/interactive/cognitiveservices/v1"), new CancellationToken());
priyaravi20 commented 6 years ago

Hi @maptz - You are correct that the current C# SDK is only in library form. We have recently released the updated websocket protocol under https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/websocketprotocol. We do plan to have an implementation of this in C# in the next few months. In the meantime you can follow the protocol to write your own .Net Core implementation.

maptz commented 6 years ago

Thanks Pryaravi. I've been trying to get it to work by using the instructions there, and I keep hitting a brick wall - I keep hitting BadRequests while trying to connect to the WebSocket. Are there any pointers from within the existing code that you can help me with?

priyaravi20 commented 6 years ago

I am not sure if you are good to look through JavaScript code. But the only implementation we currently have is written in JavaScript and can help you to get started as it implements the same underlying protocol.

https://github.com/Azure-Samples/SpeechToText-WebSockets-Javascript

Hope that helps.

maptz commented 6 years ago

Thanks @priyaravi20 . I've tried reading the Javascript code, but am still hitting a brick-wall with my C# implementation. I'm sure I'm missing something extremely simple. Is anyone on the contributor team who's conversant in C# able to take a look at my implementation. I've asked it in full here: https://stackoverflow.com/questions/45319819/bing-speech-api-call-websocket-from-net-core

maptz commented 6 years ago

I've really hit a brick wall with this. I've been through the Javascript code, and don't appear to be doing anything different.

If anyone on the team is conversant in C# and Bing Sping WebSockets Api, I've put a full sample here. It would be great to get this working, and would solve a lot of pain.

https://stackoverflow.com/questions/45492964/bing-speech-to-text-api-never-receive-message-from-server

priyaravi20 commented 6 years ago

Hi @maptz - Unfortunately We don't have anyone in the team to look at issues at source code level. However we have another implementation in c https://github.com/technicianted/libmsspeech which I hope can help with the roadblock.

maptz commented 6 years ago

Thanks. I've had a look at that sample, and frustratingly it looks like I'm doing most of the same things. The only thing I can spot that's different is per-message compression. Do you know if this is necessary? It doesn't appear to be in the spec.

priyaravi20 commented 6 years ago

Hi @maptz - Were you able to make progress with workarounds suggested?

maptz commented 6 years ago

Yes. I got it working. I'll post the working sample onto GitHub at the weekend.

DavidKarlas commented 6 years ago

@maptz Will you post link to sample here?

maptz commented 6 years ago

Hi @DavidKarlas ,

I've uploaded my sample C# code here.

arun02139 commented 6 years ago

@priyaravi20 Bump, we are also feeling this "pain" and thanks @maptz for your C# sample!

In our case, we are using Unity as a middleware for developing a cross-platform VR education app and are unable to integrate the SpeechClient DLL without recompiling from source :( https://www.nuget.org/packages/Microsoft.ProjectOxford.SpeechRecognition-x64/

I'll take a look at the c and JavaScript implementations – given the popularity of Unity and the strong collaboration with MSFT and Unity in Mixed Reality, I think this would be a boon for the Cognitive Services =D