XTTS: add inference_stream_text (slightly friendlier for text-streaming)

czuzu commented 1 month ago

Hello,

Doing TTS streaming but also with text-streaming (text coming progressively over a stream), locally. I know inference_stream theoretically is enough for this case, except for the beginning part (which indeed is not so bad to be repeated but nicer would be to be able to skip it too since it's not necessary):

language = language.split("-")[0]  # remove the country code
length_scale = 1.0 / max(speed, 0.05)
gpt_cond_latent = gpt_cond_latent.to(self.device) # nicer to be able to skip when doing text-streaming
speaker_embedding = speaker_embedding.to(self.device) # nicer to be able to skip when doing text-streaming

So I've added inference_stream_text (maybe not the best name, let me know if you prefer another) particularly for text-streaming, e.g.:

def text_streaming_generator():
    yield "It took me quite a long time to develop a voice and now that I have it I am not going to be silent."
    yield "Having discovered not just one, but many voices, I will champion each."

print("Inference with text streaming...")

text_gen = text_streaming_generator()
inf_gen = model.inference_stream_text(
    # note `text` param not provided as it will be streamed
    "en",
    gpt_cond_latent,
    speaker_embedding
)

wav_chunks = []
for text in text_gen:
    # Add text progressively
    model.inference_add_text(text, enable_text_splitting=True)
    for chunk in enumerate(inf_gen):
        if chunk is None:
            break # all chunks generated for the current text
        print(f"Received chunk {len(wav_chunks)} of audio length {chunk.shape[-1]}")
        wav_chunks.append(chunk)

# Call finalize to discard the inference generator
model.inference_finalize_text()

IMO this also makes for a nicer interface when doing text-streaming, I'll leave it to you to decide :)

Cheers! 🍻

CLAassistant commented 1 month ago

All committers have signed the CLA.

czuzu commented 1 month ago

I wonder is this repo still maintained or do I have to move the PR?

czuzu commented 1 month ago

Moved here: https://github.com/idiap/coqui-ai-TTS/pull/21

coqui-ai / TTS

XTTS: add inference_stream_text (slightly friendlier for text-streaming) #3724