Uberi / speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.
https://pypi.python.org/pypi/SpeechRecognition/
BSD 3-Clause "New" or "Revised" License
8.46k stars 2.4k forks source link

Inconsistent voice activity detection #181

Closed chrisspen closed 7 years ago

chrisspen commented 7 years ago

Is there any built-in voice activity detection, or are you relying on the backend for that? It seems to be inconsistent or non-existent, and this makes the microphone examples very difficult to use. Testing with the Google Speech backend, after a few seconds of speech, minutes could go by with nothing happening, leaving me to wonder is my mic not working? Or did the package not detect speech and begin recording? Or is it still recording but hasn't detected the end of the speech? Or did it send everything up to Google and it's just taking a while? Is there any way to get debugging output so we know which stage it's at?

martin357 commented 7 years ago

Same problem here.

chrisspen commented 7 years ago

I seem to have gotten it to work more reliably. I think the problem was with my laptop microphone's amplification. It was set to 100%, which picks up a ton of noise. The "sound energy" indicator was registering around 20 bars even when I wasn't speaking. I reduced the amplification to around 20-30%, so when I wasn't speaking, only 1-2 bars where registering, and when I was speaking, around 10-15 bars registered.

With that, this script worked for me:

import sys
import time
import speech_recognition as sr

# You may need to change this to suit your hardware. None will work, but may give you ALSA warnings.
mic_index = 4

r = sr.Recognizer()
m = sr.Microphone(device_index=mic_index)

try:
    with m as source:
        r.adjust_for_ambient_noise(source)
    print("Set minimum energy threshold to {}".format(r.energy_threshold))
except IOError as e:
    print e
    print 'Try different microphone device index:', sr.Microphone.list_microphone_names()
    sys.exit(1)

do_prompt = True
while 1:

    try:

        if do_prompt:
            print("Listening!")
            do_prompt = False

        with sr.Microphone(device_index=mic_index) as source:
            audio = r.listen(source, timeout=0.1)
            do_prompt = True

        t0 = time.time()
        print 'Recognizing...'
        text = r.recognize_google(audio)
        td = time.time() - t0
        print 'Response seconds:', td

        print(text)
    except sr.UnknownValueError:
        # We heard you, but Google couldn't figure out what you said.
        pass
    except sr.WaitTimeoutError:
        pass
martin357 commented 7 years ago

Thank you, I tried lessening the amplification as you described and it works much more reliably now.