KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
MIT License
1.62k stars 150 forks source link

input pcm buffer_size issue #22

Open jacobtang opened 7 months ago

jacobtang commented 7 months ago

BUFFER_SIZE = 512 self.buffer_size = BUFFER_SIZE feed_audio(self, chunk): when I call feed_audio,the input data size is 640/768(16k,mono) from our realtime server,should I change the buffer_size (512) in the audio_recorder? Thanks!

KoljaB commented 7 months ago

No, just leave it as it is. The feed_audio method takes care of the buffer size.

You just need to ensure that the input chunks are PCM raw data and 16000 Hz sample rate:

from scipy.signal import resample

def decode_and_resample(
        audio_data,
        original_sample_rate,
        target_sample_rate):

    # Decode 16-bit PCM data to numpy array
    audio_np = np.frombuffer(audio_data, dtype=np.int16)

    # Calculate the number of samples after resampling
    num_original_samples = len(audio_np)
    num_target_samples = int(num_original_samples * target_sample_rate /
                             original_sample_rate)

    # Resample the audio
    resampled_audio = resample(audio_np, num_target_samples)

    return resampled_audio.astype(np.int16).tobytes()

resampled_chunk = decode_and_resample(chunk, sample_rate, 16000)
recorder.feed_audio(resampled_chunk)
jacobtang commented 7 months ago

Thanks! Can feed_audio with 16k stereo data?the raw data is 48k,stereo data,when i call decode_and_resample,the audio data is 16k stereo. In my realtime server,I can not get recorder.text() in a loop by using thread,may be the feed data is not correct.

KoljaB commented 7 months ago

Should be mono 16000 Hz, 16 Bit, PCM