elevenlabs / elevenlabs-js

The official JavaScript (Node) library for ElevenLabs Text to Speech.
https://elevenlabs.io
MIT License
60 stars 7 forks source link

Similar Latencies in Text-to-Speech for Streaming and Standard Requests #42

Open Kaanayden opened 1 month ago

Kaanayden commented 1 month ago

Hello ElevenLabs Team,

I am trying to develop a text to speech Discord bot. To achieve shorter latencies, I have tried the streaming on voice generation, but I'm encountering nearly identical latencies with both settings. I am using v0.5.0.

With streaming:

const audioStream = await elevenlabs.generate({
    stream: true,
    voice: "Josh",
    text: text,
    model_id: "eleven_multilingual_v2",
    optimize_streaming_latency: 2,
    voice_settings: {
        stability: 0.5,
        similarity_boost: 0.8,
        style: 0.0,
        use_speaker_boost: true,
    }
});
/*
Output times (in milliseconds): 
2143
2148
2142
3678
*/

Without streaming:

const audioStream = await elevenlabs.generate({
    stream: false,
    voice: "Josh",
    text: text,
    model_id: "eleven_multilingual_v2",
    optimize_streaming_latency: 2,
    voice_settings: {
        stability: 0.5,
        similarity_boost: 0.8,
        style: 0.0,
        use_speaker_boost: true,
    }
});
/*
Output times (in milliseconds):
2145
2222
2241
2268
*/

Is it normal or is there an error on the code? Is it expected for the streaming mode to have similar or identical latencies compared to non-streaming mode under these settings? I have also tried using the direct API streaming endpoint (POST request in the https://elevenlabs.io/docs/api-reference/streaming) and got similar results.

I appreciate any guidance or insights you can provide!