Uberi / speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.
https://pypi.python.org/pypi/SpeechRecognition/
BSD 3-Clause "New" or "Revised" License
8.32k stars 2.4k forks source link

Worse audio quality when specifying device #221

Open v-lopez opened 7 years ago

v-lopez commented 7 years ago

I have a problem with the quality of the recorded audio, I have tested in 2 computers with the same device. The computers have one integrated soundboard with no microphone attached, and a USB Andrea device (http://www.andreaelectronics.com/array-microphone/).

Steps to reproduce

I am using the write_audio.py example, with the addition of being able to force the device_index.

If I let Microphone pick up automatically the device, it picks up my integrated soundboard, I cannot record any audio like this.

If I force the device_index to the Andrea device, I can record audio, but the quality is very bad (andrea_device.wav sample).

If I change the default device via Ubuntu Sound Settings, so that the Andrea is the default device, it records audio properly and without noise (default_device.wav sample).

The problem is that I'd like my application to be able to specify the device to used, not to be restricted to the default device. But as soon as I force the Andrea device, the audio quality worsens.

I have seen that if I print the device_info, there's a difference when opened forcing the device_index:

{'defaultSampleRate': 44100.0, 'defaultLowOutputLatency': -1.0, 'defaultLowInputLatency': 0.008684807256235827, 'maxInputChannels': 2L, 'structVersion': 2L, 'hostApi': 0L, 'index': 1, 'defaultHighOutputLatency': -1.0, 'maxOutputChannels': 0L, 'name': u'AndreaMA: USB Audio (hw:1,0)', 'defaultHighInputLatency': 0.034829931972789115}

And using it as the system default:

{'defaultSampleRate': 44100.0, 'defaultLowOutputLatency': 0.008707482993197279, 'defaultLowInputLatency': 0.008707482993197279, 'maxInputChannels': 32L, 'structVersion': 2L, 'hostApi': 0L, 'index': 16L, 'defaultHighOutputLatency': 0.034829931972789115, 'maxOutputChannels': 32L, 'name': u'default', 'defaultHighInputLatency': 0.034829931972789115}

System information

(Delete all the statements that don't apply.)

My system is Ubuntu 14.04 LTS x64

My Python version is 2.7.6

My Pip version is 1.5.4.

My SpeechRecognition library version is 3.6.0

My PyAudio library version is 0.2.9

Uberi commented 7 years ago

Hi @v-lopez,

What is the nature of the "bad quality"? Can you upload WAV file recordings from each, using both the write_audio example and with arecord? The arecord version is important to help diagnose the issue.

v-lopez commented 7 years ago

My bad, I had uploaded the file but deleted the link. audio_examples.zip

There are 3 files, two were recorded using the write_audio.py example, one of them using the default device, and another specifying the device_index to use the Andrea device. The third file is recorded using arecord version 1.0.27.2, as you can see I was not able to reproduce the noise using arecord.

So as a summary, my system has only one valid microphone that is a USB microphone. If I force the Microphone class to use it, it generates rithmic noise in the background, as in the example "arecord_recording-forced_andrea_device.wav". If I set my system default audio input device to be the USB microphone, and create the Microphone object without providing any device_index, it captures from the USB microphone without that noise. Arecord always records it properly.

Uberi commented 7 years ago

Thanks for the examples!

My hypothesis now is that something's going wrong during resampling, since we're getting those strange clicks every 940k samples. Can you try replacing Microphone(...) with Microphone(sample_rate=44100, ...) in your non-default-device configuration, and seeing if that improves the situation?

If that doesn't work, does increasing the chunk_size parameter change anything about the duration/frequency of those clicks?

v-lopez commented 7 years ago

Same result forcing sample_rate=44100, increasing chunk_size to 2048 makes the cilcks more frequent, and decreasing it to 512 seems to make them disappear.

Attaching files: more_tests.zip