Uberi / speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.
https://pypi.python.org/pypi/SpeechRecognition/
BSD 3-Clause "New" or "Revised" License
8.17k stars 2.39k forks source link

Saved audio recorded with SR plays choppy and too fast #646

Open antimatter84 opened 1 year ago

antimatter84 commented 1 year ago

Steps to reproduce

  1. Record audio from an USB audio interface (Focusrite Scarlett) with Microphone() instance
  2. Save to file with Pythons wave module

Here's an exemplary code that shows what I do (copied together from actual source):

import speech_recognition as sr
import wave

mic_index = 7  # focusrite scarlett input

recognizer = sr.Recognizer()
mic = sr.Microphone(device_index=mic_index)

print('Recording...')
with mic as source:
    recognizer.adjust_for_ambient_noise(source, duration=0.2)
    audio = recognizer.listen(source, timeout=1, phrase_time_limit=5)

wave_file = wave.open('audiotest.wav', 'wb')
wave_file.setnchannels(1)
wave_file.setsampwidth(2)
wave_file.setframerate(16000)
wave_file.writeframes(audio.get_wav_data(convert_rate=16000))
wave_file.close()

Expected behaviour

The written wave file should sound like the original audio source: clean and correct tempo

Actual behaviour

The written wave file sounds somewhat choppy and way too fast. audiotest.wav.zip

Recording audio from the device with arecord -D plughw:1,0 -f cd -d 5 alsatest.wav produces a clean result.

System information

(Delete all the statements that don't apply.)

My system is Linux Mint 20.3 Cinnamon.

My Python version is 3.8.10.

My Pip version is 20.0.2.

My SpeechRecognition library version is 3.9.0.

My PyAudio library version is 0.2.13

My microphones are:

HDA NVidia: HDMI 0 (hw:0,3)
HDA NVidia: HDMI 1 (hw:0,7)
HDA NVidia: HDMI 2 (hw:0,8)
HDA NVidia: HDMI 3 (hw:0,9)
HDA NVidia: HDMI 4 (hw:0,10)
HDA NVidia: HDMI 5 (hw:0,11)
HDA NVidia: HDMI 6 (hw:0,12)
Scarlett 2i2 USB: Audio (hw:1,0)
HD-Audio Generic: ALC1220 Analog (hw:2,0)
HD-Audio Generic: ALC1220 Digital (hw:2,1)
HD-Audio Generic: ALC1220 Alt Analog (hw:2,2)
C922 Pro Stream Webcam: USB Audio (hw:3,0)
hdmi
pulse
default

My working microphones are:

  7: 'Scarlett 2i2 USB: Audio (hw:1,0)', 
  11: 'C922 Pro Stream Webcam: USB Audio (hw:3,0)', 
  13: 'pulse', 
  14: 'default'
}
pgeschwill commented 7 months ago

Hi @antimatter84,

I had the exact same experience and what helped me greatly was playing around with the chunk_size parameter. In my case, setting it to 512 instead of the default 1024 drastically increased the quality of the recorded audio. That also made recognition (with VOSK) much more reliable.

Give it a shot and let me know how it goes :)