bastibe / SoundCard

A Pure-Python Real-Time Audio Library
https://soundcard.readthedocs.io
BSD 3-Clause "New" or "Revised" License
680 stars 69 forks source link

numpy-int16 as recording output #34

Closed d3ck-org closed 4 years ago

d3ck-org commented 5 years ago

Hi, I am working on an speak/word detection project and want to test related frameworks and libraries. The microphone must be part of a bluetooth headset and (regarding python libraries) this is sometimes problematic: Currently I am testing Procupine. Procupine (like many other detection frameworks) uses PyAudio for the soundcard connection but PyAudio seems to detect only the ALSA devices (the internal soundcard) but to ignore the PulseAudio layer (the bluetooth headset).

Headset detection works (in combination with Ubuntu 18.04 and PulseAudio soundserver) like a charm with your SoundCard library! So I would like to replace this PyAudio part from Procupine (and perhaps other word detection libraries) with SoundCard:

pa = pyaudio.PyAudio()

audio_stream = pa.open(
    rate=porcupine.sample_rate,
    channels=1,
    format=pyaudio.paInt16,
    input=True,
    frames_per_buffer=porcupine.frame_length,
    input_device_index=self._input_device_index)

while True:
    pcm = audio_stream.read(porcupine.frame_length)
    pcm = tuple(numpy.frombuffer(pcm, 'int16'))   # same as:  struct.unpack_from("h" * porcupine.frame_length, pcm)

As you can see: The result that will be feeded to the word detection is an int16 tuple (numpy array). But SoundCard returns a float32 numpy array. My short and very naive approach was to adjust SoundCard's sample and numpy format parts but of course without getting the desired result :)

Unfortunately I don't have any experience and knowledge regarding sound related coding and also don't have the time to familiarise myself with this topic because I must concentrate on my project goals. So it would be really great if you could give me a pointer how to adjust the SoundCard functions to return int16 arrays :) Thank you very much.

bastibe commented 5 years ago

You could just convert it manually:

float_data = soundcard.default_soundcard.record(1024, 48000)
int16_data = numpy.astype(float_data * 2**15, 'int16')

But really, you should use float everywhere. Otherwise, any signal exceeding [-1, 1] will produce horrible clicking artifacts.