Open Quilljou opened 1 week ago
Hi, if your Speech SDK version is 1.25 then it's a very old version from January 2023 and you should upgrade to the current (1.40.0 as of this writing).
You can use silence timeouts to end recognition - please see attached a Python example (which is a lot faster to come up with...) that demonstrates the principle using microphone input: timeout.zip
There are two silence timeouts that can be controlled, initial and end (SPXPropertyId.speechServiceConnectionInitialSilenceTimeoutMs
and SPXPropertyId.speechServiceConnectionEndSilenceTimeoutMs
respectively, ref. SPXPropertyId).
For example, the sequence of events from the very beginning could be like
initial silence timeout
initial silence timeout
recognizing speech
recognized speech
end silence timeout
initial silence timeout
recognizing speech
...
Whenever either one of these silence timeouts occurs, there is a SpeechEndDetected
event that you can subscribe to with addSpeechEndDetectedEventHandler. So if you want to automatically end recognition after silence of N seconds, whether before or after any speech has been recognized, set both timeouts to the (same) desired value (with setPropertyTo, example) and signal the end of recognition in the handler of SpeechEndDetected
.
Note that you should not call stopContinuousRecognition
inside an event handler which is called by the SDK, instead use some method to notify your application thread (similar to the Python example). Also, although the timeout values are in milliseconds, the actual moment when the timeout occurs can deviate from that by 100-300 ms depending on the service, network etc. so it's better to use just full seconds and not expect millisecond accuracy.
Thanks so much. And Is there a way to know when the user start speaking? some startspeaking event like that? @pankopon
Yes there is addSpeechStartDetectedEventHandler for the SpeechStartDetected
event. This can appear a bit earlier than the first Recognizing
event.
seems this event is not user start speaking at version 1.2.0
On Tue, Sep 17, 2024 at 07:30 pankopon @.***> wrote:
Yes there is addSpeechStartDetectedEventHandler https://learn.microsoft.com/objectivec/cognitive-services/speech/spxrecognizer#addspeechstartdetectedeventhandler for the SpeechStartDetected event. This can appear a bit earlier than the first Recognizing event.
— Reply to this email directly, view it on GitHub https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2582#issuecomment-2354207289, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMPZXO56OS6HRGECAPFCPTZW5S3BAVCNFSM6AAAAABOAX27ROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJUGIYDOMRYHE . You are receiving this because you authored the thread.Message ID: @.*** .com>
Do you mean "speech start detected" is not what you expect with "user start speaking"? That's really the only indication of that kind available. If you mean it's not working with your Speech SDK installation, make sure to use a current Speech SDK release. 1.40.0 is the latest as of now.
Yes. Currently I upgrade to the 1.4.0 version. I still face 2 problems. First I don't know when the user start speaking. The speechstartdetected event is behind the user actually speaking. So I have to do vad by myself Secondly I set speechServiceConnectionEndSilenceTimeoutMs to 3000. But the result is when user stop speaking. And after maybe 6 seconds I got the sessionstopped event
If you want to know when speech is starting before audio is even sent to the service, then yes currently you'll have to detect it in the application. SpeechStartDetected
comes from the service when audio has been processed to the extent that the presence of speech has been confirmed (for real, not just that there is something other than silence).
Silence timeouts are only triggered by a fixed duration of silence. With EndSilenceTimeout of X seconds, it occurs ~X seconds of silence after the latest Recognized phrase; with InitialSilenceTimeout of Y seconds it occurs after ~Y seconds of silence anywhere else. Yes, "SpeechEndDetected" as the event name can be a bit misleading in that sense, but it's really just based on what's been configured for EndSilenceTimeout after Recognized speech. (With InitialSilenceTimeout it's even more misleading since there was no SpeechStartDetected... we may adjust naming of exposed events in the future.) The "phrase end detected" is when a Recognized phrase is reported. The condition for "user has stopped speaking and is not likely to continue" is up to you to decide, but silence timeouts are one way to detect it.
Thanks for replying. That means I can't know when the user starts speaking from silence. And also I can't know when the user stops speaking from the capabilities of Speech SDK. I have to implement the VAD by myself on device
I am using Swift pod 'MicrosoftCognitiveServicesSpeech-iOS', '~> 1.25' for continuous speech recognition. I want to implement a feature where the recognition automatically stops if the user doesn't speak for N seconds after it starts. What is the best practice for this? How can I achieve it?
This is my current implementation, but sometimes it ends the recognition early.