Uberi / speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.
https://pypi.python.org/pypi/SpeechRecognition/
BSD 3-Clause "New" or "Revised" License
8.39k stars 2.39k forks source link

Problem with non stop listening.(tried dynamic_energy_threshold) #546

Open AlexMnatsakanian opened 3 years ago

AlexMnatsakanian commented 3 years ago

i am currently trying to make a virtual assistant, and i am having some problems with .listen(source), it would listen to my audio for a very long time after i say something. so i tried using .record instead and it worked better because it always stops listening in 5 seconds. but the problem is you do not know when to start talking because if you say something in the middle of the loop when it starts listening, it wont capture audio.

after reading the docs to solve the problem, i tried implementing listener.dynamic_energy_threshold = 30000, i do not know if i can change the threshold above the values they recommend, but i tried and it makes it better but the issue still happens.

keep in mind that i have a blue yeti, a very sensitive mic.

listener.dynamic_energy_threshold = 30000 listener.pause_threshold = 0.7

with sr.Microphone() as source: print('[-] listening...') voice = listener.listen(source, timeout=5) command = listener.recognize_google(voice) command = command.lower()

holocronweaver commented 3 years ago

Also having this issue transiently, have no issue with other voice recognition applications. Usually occurs when I first start the application and call listen for the first time, where it will hang indefinitely, ignoring the timeout I set.

      _recognizer = sr.Recognizer()
      mic = sr.Microphone()
      with mic as source:
                _recognizer.dynamic_energy_threshold = True
                _recognizer.energy_threshold = 100
                _recognizer.pause_threshold = 0.7
                _recognizer.adjust_for_ambient_noise(source, duration = 1)
                print('Listening for audio...')
                audio = _recognizer.listen(source, timeout=5)
                print('Transcribing audio...')
                transcription = _transcribe_audio(audio)

In this example Transcribing audio... either is never called, or can take a long time.

Usually after the first listen call completes, the next couple calls return more quickly, but soon the issue will return again, either causing the 3rd or 4th listen to hang indefinitely or wait a long time in silence (> 10 secs). I suspect something is going wrong with the dynamic energy threshold, but setting it False does not help. I also tried playing with energy_threshold and pause_threshold, as well as adjust_for_ambient_noise, no luck.

@Shentikai A minor point: dynamic_energy_threshold is a boolean, I think you want energy_threshold to set an initial value to speed up adjustment.