cartesia-ai / cartesia-python

The official Python library for the Cartesia API
https://pypi.org/project/cartesia/
MIT License
30 stars 4 forks source link

Allow passing `AsyncGenerator[str]` to `AsyncCartesia` context #48

Open kalradivyanshu opened 2 weeks ago

kalradivyanshu commented 2 weeks ago
from cartesia import Cartesia
import pyaudio
import os

client = Cartesia(api_key=os.environ.get("CARTESIA_API_KEY"))

transcripts = [
    "The crew engaged in a range of activities designed to mirror those "
    "they might perform on a real Mars mission. ",
    "Aside from growing vegetables and maintaining their habitat, they faced "
    "additional stressors like communication delays with Earth, ",
    "up to twenty-two minutes each way, to simulate the distance from Mars to our planet. ",
    "These exercises were critical for understanding how astronauts can "
    "maintain not just physical health but also mental well-being under such challenging conditions. ",
]

# Ending each transcript with a space makes the audio smoother
def chunk_generator(transcripts):
    for transcript in transcripts:
        if transcript.endswith(" "):
            yield transcript
        else:
            yield transcript + " "

# You can check out voice IDs by calling `client.voices.list()` or on https://play.cartesia.ai/
voice_id = "87748186-23bb-4158-a1eb-332911b0b708"

# You can check out our models at https://docs.cartesia.ai/getting-started/available-models
model_id = "sonic-english"

# You can find the supported `output_format`s at https://docs.cartesia.ai/api-reference/endpoints/stream-speech-server-sent-events
output_format = {
    "container": "raw",
    "encoding": "pcm_f32le",
    "sample_rate": 44100,
}

p = pyaudio.PyAudio()
rate = 44100

stream = None

# Set up the websocket connection
ws = client.tts.websocket()

# Create a context to send and receive audio
ctx = ws.context()  # Generates a random context ID if not provided

# Pass in a text generator to generate & stream the audio
output_stream = ctx.send(
    model_id=model_id,
    transcript=chunk_generator(transcripts),
    voice_id=voice_id,
    output_format=output_format,
)

for output in output_stream:
    buffer = output["audio"]

    if not stream:
        stream = p.open(format=pyaudio.paFloat32, channels=1, rate=rate, output=True)

    # Write the audio data to the stream
    stream.write(buffer)

stream.stop_stream()
stream.close()
p.terminate()

ws.close()  # Close the websocket connection

This but for async will be great to have 👍