Streaming input to streaming TTS

coqui-ai / xtts-streaming-server

Mozilla Public License 2.0

298 stars 90 forks source link

Streaming input to streaming TTS #10

Open santhosh-sp opened 11 months ago

santhosh-sp commented 11 months ago

Hello Team,

Is it possible to run TTS streaming with streaming input text with same file name?

Example:

def llm_write(prompt: str):

    for chunk in openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        message=[{"role": "user", "content": prompt}],
        stream=True
    ):
        if (text_chunk := chunk["choice"][0]["delta"].get("content")) is not None:
            yield text_chunk


text_stream = llm_write("Hello, what is LLM?")

audio = stream_ffplay(
    tts(
        args.text,
        speaker,
        args.language,
        args.server_url,
        args.stream_chunk_size
    ), 
    args.output_file,
    save=bool(args.output_file)
)

With minimum words to the TTS api.

Thanks, Santhosh

mercuryyy commented 11 months ago

Is this possible?

santhosh-sp commented 11 months ago

Yes, Its possible.

mercuryyy commented 11 months ago

Is it build into the [xtts-streaming-server] repo ? or it has to be tweaked?

I was getting ready to test it out this weekend before i install it.

mercuryyy commented 11 months ago

Any chance you can post some working examples? was able to get the docker working but i dont see any logic for providing the yield chunks as the text to the api

nurgel commented 11 months ago

is splitting at end of sentence (.?!) the best option here?

Fusion9334 commented 10 months ago

def llm_write(prompt: str): buffer = "" for chunk in openai.ChatCompletion.create( model="gpt-3.5-turbo", message=[{"role": "user", "content": prompt}], stream=True ): if (text_chunk := chunk["choice"][0]["delta"].get("content")) is not None: buffer += text_chunk if should_send_to_tts(buffer): # Define this function to decide when to send yield buffer buffer = "" # Reset buffer after sending

text_stream = llm_write("Hello, what is LLM?")

for text in text_stream: audio = stream_ffplay( tts( text, speaker, language, server_url, stream_chunk_size ), output_file, save=bool(output_file) )

AI-General commented 9 months ago

I believe input needs to be at least a sentence, as speech relies heavily on the context provided by subsequent words.

oscody commented 5 months ago

def llm_write(prompt: str): buffer = "" for chunk in openai.ChatCompletion.create( model="gpt-3.5-turbo", message=[{"role": "user", "content": prompt}], stream=True ): if (text_chunk := chunk["choice"][0]["delta"].get("content")) is not None: buffer += text_chunk if should_send_to_tts(buffer): # Define this function to decide when to send yield buffer buffer = "" # Reset buffer after sending

text_stream = llm_write("Hello, what is LLM?")

for text in text_stream: audio = stream_ffplay( tts( text, speaker, language, server_url, stream_chunk_size ), output_file, save=bool(output_file) )

does this work