elevenlabs / elevenlabs-python

The official Python API for ElevenLabs Text to Speech.
https://elevenlabs.io/docs/api-reference/getting-started
MIT License
2.17k stars 251 forks source link

(request): Support for streaming without mpv #290

Open abiel-lozano opened 5 months ago

abiel-lozano commented 5 months ago

Currently, the only way to stream audio is to use mpv, but it would be nice to have a way to stream audio using a Python-only solution, without installing any additional software other than the library. For my specific use case, I unfortunately cannot rely on external programs.

Has anyone found a way to do this? I'm not so sure of my solution because it seems over-complicated to me.


I was able to stream audio from the API with pyaudio by using the supported PCM output formats. It requires using threading to allow pyaudio to stay initialized and play the audio while the chunks are being received, and queue to synchronize the audio chunks to prevent different ones from being played at the same time if the next chunk arrives before the current is done playing.

Here's an example of what this method would look like as part of the library.

I am currently unable to test this method from within the library, as I ran out of monthly credits while getting this thing to play audio smoothly and with low latency because at first I tried testing it with eleven_multilingual_v2, causing unrelated playback issues (see #114 ). However, it is essentially a simplified version of what I got working consistently in a messy test script for the open source project I'm using Elevenlabs for.

From src\elevenlabs\play.py, in line 62:

# New function, takes a chunk from the queue and plays it until the queue has no more chunks
def playFromQueue(sampleRate: int) -> None:
    p = pyaudio.PyAudio()
    stream = p.open(format=pyaudio.paInt16, channels=1, rate=sampleRate, output=True)

    while audioQueue.get() is not None:
        data = audioQueue.get()
        stream.write(data)

    stream.stop_stream()
    stream.close()
    p.terminate()

# Modified stream function, mvp is still selected by default
def stream(audio_stream: Iterator[bytes], use_mpv: bool = True, output_format: str) -> bytes:
    if use_mpv:
        if not is_installed("mpv"):
            message = (
                "mpv not found, necessary to stream audio. "
                "On mac you can install it with 'brew install mpv'. "
                "On linux and windows you can install it from https://mpv.io/"
                r"Or you can use stream(audio_stream, use_mpv=False, output_format={supported_pcm_format}) instead."
            )
            raise ValueError(message)

        mpv_command = ["mpv", "--no-cache", "--no-terminal", "--", "fd://0"]
        mpv_process = subprocess.Popen(
            mpv_command,
            stdin=subprocess.PIPE,
            stdout=subprocess.DEVNULL,
            stderr=subprocess.DEVNULL,
        )

        audio = b""

        for chunk in audio_stream:
            if chunk is not None:
                mpv_process.stdin.write(chunk)  # type: ignore
                mpv_process.stdin.flush()  # type: ignore
                audio += chunk
        if mpv_process.stdin:
            mpv_process.stdin.close()
        mpv_process.wait()

        return audio
    else:
        try: 
            import threading
            import queue
            import pyaudio

            # Sample rate is indicated in the output format notation, extracted from output_format for ease of use
            output_format = output_format.split("_")
            sampleRate = int(output_format[1])

            audioQueue = queue.Queue()

            threading.Thread(target = playFromQueue, args=(sampleRate,)).start()

            for chunk in audio_stream:
                if chunk:
                    audioQueue.put(chunk)

        except Exception as e:
            message = (
                "Error while trying to stream audio using pyaudio. "
                "Make sure you have pyaudio installed. "
                "Or you can use stream(audio_stream, use_mpv=True) instead."
            )
            raise ValueError(message) from e
teis-e commented 4 months ago

Hello, great work on the code. Can you show me an example of how you would use it?

I'm having issues on getting the stream from elevenlabs:

response = eleven_labs_client.generate(
                            text=text_stream,
                            output_format="ulaw_8000",
                            voice=Voice(
                                voice_id='EuQDS5FR7dD1UoZkGdbZ',
                                settings=VoiceSettings(stability=0.5, similarity_boost=0.8, style=0.21, use_speaker_boost=True)
                            ),
                            model="eleven_multilingual_v2",
                            stream=True
                        )

                        audio_stream = stream(response)

                        # Write each chunk of audio data to the stream
                        for chunk in audio_stream: #response:
                            if chunk:
                               # This is not working very well, i either just get 1 chunk at all or i get this error: a bytes-like object is required, not 'int'
                               audio_output = base64.b64encode(chunk).decode("utf-8")

I hope you can help me :)

abiel-lozano commented 4 months ago

What issues are you having? Audio stutter? or no audio at all?

I do have 1 working example in this messy test script, just ignore the ChatGPT requests stuff and keep in mind that it uses websockets to connect to the API, and not the library function(s) like you have in your code.

Please consider that my method only worked with the supported PCM formats, and I did not test it with ulaw_8000, which you are using, because it was incompatible with my use case.