Closed abeulich closed 6 years ago
Unfortunately this isn't something that's controlled by Sonus and is dependent on which cloud speech recognizer you use. In the case of Google Cloud Speech, we wait for the isFinal
flag before sending final results.
One really hacky thing that you could try (that I wouldn't really recommend) is having a timer attached to your partial results and if partial results haven't changed (and recognized words are above a confidence threshold) you process those results.
Another option would be to have a "stop" word that once recognized could fire whatever interim results you have at the time - but again, this feels hacky to me.
You might be better off asking this question on the Google Cloud Speech repo.
Hi Evan, many thanks for taking the time to comment on this. I understand it's very likely to be the processing of the captured audio and waiting for the recognition results that makes it feel a bit slow to me.
My understanding of the process (without partial results) is the following though:
I was hoping to save some time when my voice command gets recorded by sonus (Step 2). I don't understand how sonus detects that I'm done speaking. I guess it detects pauses and when the pause is long enough it's considered to be the end of what I wanted to say?
Is this assumption correct and is there a place where I could make sonus more "aggressive" when deciding to end the recording?
Many thanks again, Alex
Just played with the partial-results and realized they are basically coming in when I'm still speaking. Therefore I guess audio already gets sent to the cloud service while I'm speaking (?) and I underestimated the optimization of sonus. :)
In my code I'm also looking at partial-results now (I was omitting them before) and for most voice commands it makes it faster than waiting for the final-result.
Still looking forward to any further comments, but you can also close this "issue" if you want.
Correct, after a hotword is detected your audio is streamed to the cloud service - but only until the cloud service detects the end of the utterance (at which point the final results event is fired and audio stops being streamed).
Beyond improving the speed of the could speech recognizer there's not much that can be done without sacrificing accuracy.
Thanks again. I'll keep trying to improve my code, that processes the results. It's faster already since I started to match partial results to what I'm looking for. :)
Hi, first of all I'm very happy with how sonus works, but I'd like to make it a bit faster if possible. Thinking about it now I'm not sure if the "delay" I'm feeling doesn't stem from the processing time in the speech recognition cloud service?!
Anyway is there a way to make sonus consider an utterance as finished more quickly? In my use case I say the hot word, it gets recognized, I play a confirmation sound and after that I say a simple voice command for my home automation.
I have a feeling sonus waits a bit too long until it considers my voice command to be finished? Can this time out be shortened?
Thanks in advance, Alex