Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.26k stars 4.61k forks source link

[BUG] CognitiveServices.Speech not waiting on InitialSilenceTimeoutMs before returning NoMatch #37601

Open roninstar opened 1 year ago

roninstar commented 1 year ago

Library name and version

Microsoft.CognitiveServices.Speech 1.30

Describe the bug

When using TranslationRecognizer we pass in the config variables like we do for SpeechRecognizer to allow for a set amount of Silence detected before stopping the process. RecognizeOnceAsync comes back in 4 to 8 seconds maximum with NOMATCH. But when we run the same code with SpeechRecognizer it does wait the full 25 seconds of silence before stopping. Is there anyway that the TranslationRecognizer RecognizeOnceAsync() can also respect that same config variable?

Expected behavior

When setting the TranslationConfig speechTranslationConfig.SetProperty(PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, "25000"); and calling recognizer.RecognizeOnceAsync() We expect the Recognizer Event to not be raised for 25 seconds if there is silence for that long.

Actual behavior

What actually happens is that Recognized event is triggered with NoMatch ResultReason after only 4 to 8 seconds of silence.

Reproduction Steps

var region = CognitiveServicesSettings.AzureRegionWestUs; // Currently the v2 endpoint is required. In a future SDK release you won't need to set it. var endpointString = $"wss://{region}.stt.speech.microsoft.com/speech/universal/v2"; var endpointUrl = new Uri(endpointString);

            var speechTranslationConfig =
                SpeechTranslationConfig.FromEndpoint(endpointUrl, CognitiveServicesSettings.AzureSpeechResourceKey);

            // Source language is required, but currently ignored. 
            string fromLanguage = "en-US";

            speechTranslationConfig.SpeechRecognitionLanguage = fromLanguage;

            speechTranslationConfig.AddTargetLanguage("en");
            speechTranslationConfig.SetProperty(PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs,
                "25000");
            speechTranslationConfig.SetProperty(PropertyId.Conversation_Initial_Silence_Timeout,
                "25000");
            var autoDetectSourceLanguageConfig =
                AutoDetectSourceLanguageConfig.FromLanguages(new string[]
                    { "en-US", "fr-CA", "es-MX" }); //french,spanish,english

            _stopRecognition = new TaskCompletionSource<int>();
            var translationResult = "";
            byte channels = 1;
            byte bitsPerSample = 16;
            uint samplesPerSecond = 16000;
            var audioStreamFormat = AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, bitsPerSample, channels);

            _pushAudioInputStream = new PushAudioInputStream(audioStreamFormat);

            using var audioConfig = AudioConfig.FromStreamInput(_pushAudioInputStream);

                using (var recognizer = new TranslationRecognizer(
                           speechTranslationConfig,
                           autoDetectSourceLanguageConfig,
                           audioConfig))
                {
                // Subscribes to events.
                recognizer.Recognized += (s, e) =>
                {
                    switch (e.Result.Reason)
                    {
                        case ResultReason.TranslatedSpeech:

                            var languageDetected = e.Result.Properties.GetProperty(PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult);

                            foreach (var element in e.Result.Translations)
                            {
                                if (element.Key == LanguageMapper.ServerLanguage)
                                {
                                    translationResult = element.Value;
                                }
                            }

                                _currentSpeechCaptureResult.Text = translationResult;
                            break;
                        case ResultReason.NoMatch:
                        case ResultReason.Canceled:
                           var resultStr = "NOMATCH: Speech could not be recognized.";

                            _currentSpeechCaptureResult.Text = e.Result?.Text;

                            break;
                        default:
                            break;
                    }
                };

                recognizer.Canceled += (s, e) =>
                {

                    var cancellation = CancellationDetails.FromResult(e.Result);

                    _stopRecognition.TrySetResult(0);
                };

                recognizer.SessionStarted += (s, e) =>
                {

                };

                recognizer.SessionStopped += (s, e) =>
                {

                    _stopRecognition.TrySetResult(0);
                };

                // Starts  recognition 
                recognizer.RecognizeOnceAsync().ConfigureAwait(false);
                // Waits for a single successful keyword-triggered speech recognition (or error).
                // Use Task.WaitAny to keep the task rooted.
                Task.WaitAny(new[] { _stopRecognition.Task });

Environment

Xamarin Android 8.1

jsquire commented 1 year ago

Hi @roninstar. Thank you for reaching out and we regret that you're experiencing difficulties. Can you help me understand which Azure SDK package you're using so that I can pull in the right folks to assist? There is no package named Azure.CognitiveServices.

github-actions[bot] commented 1 year ago

Hi @roninstar. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

roninstar commented 1 year ago

The github for that is Azure-Samples/cognitive-services-speech-sdk

In nuget its Microsoft.CognitiveServices.Speech

Sent from my Verizon, Samsung Galaxy smartphone Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: github-actions[bot] @.> Sent: Friday, July 14, 2023 9:39:38 AM To: Azure/azure-sdk-for-net @.> Cc: roninstar @.>; Mention @.> Subject: Re: [Azure/azure-sdk-for-net] [BUG] (Issue #37601)

Hi @roninstarhttps://github.com/roninstar. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

— Reply to this email directly, view it on GitHubhttps://github.com/Azure/azure-sdk-for-net/issues/37601#issuecomment-1635881299, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACVZ7A47GOQWXOWZZHWQJ3DXQFDZVANCNFSM6AAAAAA2JRHW5U. You are receiving this because you were mentioned.Message ID: @.***>

jsquire commented 1 year ago

Thank you for the additional context. There is no Microsoft.CognitiveServices package. From the sample, it looks as if the package that you're referring to is Microsoft.CognitiveServices.Speech, so I'm going to move forward under that assumption. That package is not open source nor maintained in this repository. I'm going to tag this such to the attention of the service team that maintains the package, but please be aware that they may not be actively monitoring.

I'd suggest that you open an Azure support request to ensure that the right folks are engaged to assist, and your issue receives prompt attention.

github-actions[bot] commented 1 year ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @robch.