Inconsistent voice activity detection

chrisspen commented 7 years ago

Is there any built-in voice activity detection, or are you relying on the backend for that? It seems to be inconsistent or non-existent, and this makes the microphone examples very difficult to use. Testing with the Google Speech backend, after a few seconds of speech, minutes could go by with nothing happening, leaving me to wonder is my mic not working? Or did the package not detect speech and begin recording? Or is it still recording but hasn't detected the end of the speech? Or did it send everything up to Google and it's just taking a while? Is there any way to get debugging output so we know which stage it's at?

martin357 commented 7 years ago

Same problem here.

chrisspen commented 7 years ago

I seem to have gotten it to work more reliably. I think the problem was with my laptop microphone's amplification. It was set to 100%, which picks up a ton of noise. The "sound energy" indicator was registering around 20 bars even when I wasn't speaking. I reduced the amplification to around 20-30%, so when I wasn't speaking, only 1-2 bars where registering, and when I was speaking, around 10-15 bars registered.

With that, this script worked for me:

import sys
import time
import speech_recognition as sr

# You may need to change this to suit your hardware. None will work, but may give you ALSA warnings.
mic_index = 4

r = sr.Recognizer()
m = sr.Microphone(device_index=mic_index)

try:
    with m as source:
        r.adjust_for_ambient_noise(source)
    print("Set minimum energy threshold to {}".format(r.energy_threshold))
except IOError as e:
    print e
    print 'Try different microphone device index:', sr.Microphone.list_microphone_names()
    sys.exit(1)

do_prompt = True
while 1:

    try:

        if do_prompt:
            print("Listening!")
            do_prompt = False

        with sr.Microphone(device_index=mic_index) as source:
            audio = r.listen(source, timeout=0.1)
            do_prompt = True

        t0 = time.time()
        print 'Recognizing...'
        text = r.recognize_google(audio)
        td = time.time() - t0
        print 'Response seconds:', td

        print(text)
    except sr.UnknownValueError:
        # We heard you, but Google couldn't figure out what you said.
        pass
    except sr.WaitTimeoutError:
        pass

martin357 commented 7 years ago

Thank you, I tried lessening the amplification as you described and it works much more reliably now.

Uberi / speech_recognition

Inconsistent voice activity detection #181