alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.83k stars 1.09k forks source link

feature request: voice activity detection #184

Open hyansuper opened 4 years ago

hyansuper commented 4 years ago

I want to end speech recognition and call FinalResult() when silence last longer than a timeout parameter. The pocketsphinx-python lib has a get_in_speech() function which seems to be doing the VAD thing, maybe we can implement similar function. [python psudo code]:

def fun(recognizer, stream, timeout):
    no_speech_count = 0
    while True:
        if recognizer.AcceptWaveform(stream.read(4000)):
            return recognizer.Result()
        elif not recognizer.get_in_speech():
            no_speech_count += 4000 / samplerate / channels
            if no_speech_count > timeout:
                return recognizer.FinalResult()

I notice there is a private function UpdateSilenceWeights, I don't know if it's related to voice activity detection.

hyansuper commented 4 years ago

or maybe, if PartialResult() does not change, then it means silence?

def fun(recognizer, stream, timeout):
    partial = ''
    no_speech_count = 0
    while True:
        if recognizer.AcceptWaveform(stream.read(4000)):
                return recognizer.Result()
        elif partial != recognizer.PartialResult()['partial']:
            partial = recognizer.PartialResult()['partial']
            no_speech_count = 0
        else:  # partial ==recognizer.PartialResult()['partial'], so there's no new speech
            no_speech_count += 4000 / samplerate / channels
                if no_speech_count > timeout:
                    return recognizer.FinalResult()
gormonn commented 4 years ago

Off-topic. But it might be useful. On the client side (browser/electron) I am using hark.

However, in fact, this is probably not the best way, due to the ability to write your own Audio-Worklet.

madkote commented 4 years ago

@hyansuper or maybe, if PartialResult() does not change, then it means silence? works for me.