KoljaB / RealtimeTTS

Converts text to speech in realtime
1.41k stars 120 forks source link

Yield output as numpy array #37

Closed olegchomp closed 5 months ago

olegchomp commented 5 months ago

Hi! I’m interesting is there any way to get audio stream (chunks while generated) output as numpy array?

KoljaB commented 5 months ago

Hey Oleg,

I think this should work like this:

from RealtimeTTS import TextToAudioStream, CoquiEngine
import numpy as np
import librosa

engine = CoquiEngine()
stream = TextToAudioStream(engine)

def process_chunk(chunk):

    _, _, sample_rate = engine.get_stream_info()

    audio_chunk = np.frombuffer(
        audio_chunk,
        dtype=np.int16
    ).astype(np.float32) / 32768.0

    audio_chunk = librosa.resample(
        audio_chunk,
        orig_sr=samplerate,
        target_sr=40000
    )

    numpy_array = np.array(audio_chunk.tolist())

    # process chunk as 40000 Hz numpy array

stream.feed("Hello World")
stream.play(on_audio_chunk=process_chunk, muted=True)

Did not test, pls answer if you run into probs