NateRickard / Xamarin.Cognitive.Speech

A client library that makes it easy to work with the Microsoft Cognitive Services Speech Services Speech to Text API on Xamarin.iOS, Xamarin.Android, UWP, and Xamarin.Forms/.NET Standard libraries used by those platforms
MIT License
59 stars 18 forks source link

Problem with UWP sample #3

Closed moonClimber closed 7 years ago

moonClimber commented 7 years ago

Hi Nate, thanks for your job! I'm working with your UWP sample and I've found a problem. No errors occour (I mean, the server returns data), but no text is recognized.

More detailed:

  1. provided the key Subscription (the key is ok, because I've already used it in other sample projects), checked the permissions in manifest
  2. launch the application in debug mode from visual studio 2017 15.4
  3. press RECORD ME button (leave default settings: Interative, Simple, Masked, Off)
  4. say something (but after about just 2 seconds automatically registration stops - I don't know why, but I think that this no matters now)
  5. check if the wave file has been created: ok. The wave file exists and every time I record, a new file overwrites the previous one (every time, no more long than 2 seconds)
  6. the request goes to server that returns this result Recognition Status: InitialSilenceTimeout DisplayText: Offset: 20000000 Duration:0

Repeat, the file exists and contains audio since the beginning. I've tryed to debug the Xamarin.Cognitive.BingSpeech (Portable) project also, but I don't really understand where is the problem. I've captured traffic through Fiddler, enabling the sniffing in https, and I can confirm that request exists from my pc. Here is the header of the request (I omit the encoded audio file because is no useful here, but I can see it as well in Fiddler) POST https://speech.platform.bing.com/speech/recognition/interactive/cognitiveservices/v1?language=it-IT&format=simple&profanity=masked HTTP/1.1 Host: speech.platform.bing.com Expect: 100-continue Accept: application/json, text/xml Authorization: Bearer <long code here, omit for simplicity> Content-Type: audio/wav Connection: Keep-Alive Content-Length: 181712

And here is the RAW response HTTP/1.1 200 OK Transfer-Encoding: chunked Content-Type: text/plain X-MSEdge-Ref: Ref A: 7C038930369341C796495FD3A0494EBA Ref B: MRS01EDGE0313 Ref C: 2017-10-29T06:17:20Z Date: Sun, 29 Oct 2017 06:17:20 GMT ` 4c {"RecognitionStatus":"InitialSilenceTimeout","Offset":20000000,"Duration":0} 0`

Do you have any suggestion? Thanks

moonClimber commented 7 years ago

UPDATE I've found here https://docs.microsoft.com/en-us/azure/cognitive-services/speech/troubleshooting in section "The RecognitionStatus in the response is InitialSilenceTimeout" this note:

the audio uses unsupported codec format, which makes the audio data be treated as silence

now I'm going to check the format of the audio file a make some tests.

moonClimber commented 7 years ago

Ouch... shame on me :( II leave this thread because maybe it may be of help to other users. The problem was just related to the volume level of audio. The .wav file actually contained my voice, but it had low audio level. It has been enough just try to speak to higher volume and everything works.