RageAgainstThePixel / ElevenLabs-DotNet

A Non-Official ElevenLabs RESTful API Client for dotnet
https://elevenlabs.io/?from=partnerbrown9849
MIT License
59 stars 19 forks source link

Add support for TextToSpeech WebSockets #46

Open StephenHodgson opened 6 months ago

StephenHodgson commented 6 months ago

Support websockets for text to speech

ElevenLabs-DotNet-Proxy should also support forwarding websockets connections

ocinon commented 4 months ago

@StephenHodgson did you start implementing WebSockets by any chance? Also, I saw the speech-to-speech model in your 3.0.0 draft, but there is no support yet, correct?

StephenHodgson commented 4 months ago

Yes I was already doing this for the unity package and was considering porting it once done

ocinon commented 4 months ago

@StephenHodgson I couldn't find any previous WebSocket implementation in your Unity repo. As I needed it, I implemented it for the DotNet version here: ocinon/ElevenLabs-DotNet@93457e124ed0397bf3532c6fd2b62c9188406d41

It extends the client slightly and tries to pick up the same patterns the repo used before. It lacks proxy support and tests. If you have any notes, let me know.

StephenHodgson commented 4 months ago

@StephenHodgson I couldn't find any previous WebSocket implementation in your Unity repo. As I needed it, I implemented it for the DotNet version here: ocinon/ElevenLabs-DotNet@93457e1

It extends the client slightly and tries to pick up the same patterns the repo used before. It lacks proxy support and tests. If you have any notes, let me know.

Feel free to open a pull request!

Only feedback is to rebase on the development branch

odillner commented 1 month ago

Any updates on this? It would be very useful in a project I'm part of.

ocinon commented 1 month ago

Sorry for never updating the thread. After some back-and-forth with ElevenLabs support, it turned out that their WebSocket implementation has a 20-second timeout. This is fine for batch conversions but makes it pretty useless for low-volume or prototyping voice-to-voice bots or similar use cases.

It might be possible to keep sending a space string (" ") as a keep-alive signal, but I stopped spending more time on it, as during testing, I didn't get speed increases compared to the REST API (but I didn't do proper testing). The code exists, and I could push it for reference.

odillner commented 1 month ago

Thanks for the quick response!

Well that's disappointing, but thanks for doing the legwork.

I'm gonna do some testing on my own, so please push the code.

ocinon commented 1 month ago

It's here ocinon/ElevenLabs-DotNet

I updated it to the latest ElevenLabs version. Keep-alive messages don't seem to work. BUT the ElevenLabs support just told me that they added an "inactivity timeout" that raises the timeout to up to 180 seconds. I added it to the code. Happy testing!

Some basic testing code:

using ElevenLabsClient client = new(ELEVEN_LABS_KEY);
await using FileStream fileStream
    = new("output.mp3", FileMode.Create, FileAccess.Write, FileShare.Read);
await client.TextToSpeechWebSocketEndpoint.StartTextToSpeechAsync(
    Voice.Arnold, (async voiceClip =>
                      {
                          if (voiceClip == null)
                          {
                              Console.WriteLine("Received null voice clip.");
                              return;
                          }

                          Console.WriteLine(
                              $"Received voice clip with {voiceClip.ClipData.Length} bytes.");
                          await fileStream.WriteAsync(voiceClip.ClipData);
                      }),
    null, null, Model.TurboV2_5, OutputFormat.MP3_44100_128, null, null, null, 180);
while (true)
{
    Console.Write("Enter text to convert to speech: ");
    string? text = Console.ReadLine();
    if (text is null) { continue; }

    if (text == "exit") { break; }

    bool?  flush   = text == "flush" ? true : null;
    bool   trigger = text == "trigger";
    string prompt  = text is "flush" or "trigger" ? "." : text;
    await client.TextToSpeechWebSocketEndpoint.SendTextToSpeechAsync(prompt, flush, trigger);
}

await client.TextToSpeechWebSocketEndpoint.EndTextToSpeechAsync();
StephenHodgson commented 1 month ago

@ocinon feel free to open a PR on the main project for everyone else to get :)

StephenHodgson commented 1 month ago

I've also been playing with the websocket support for my OpenAI-DotNet project and will likely port over some stuff from there as well, esp around the web socket client. Just a bit of an abstraction layer to help keep the socket alive, and listening, etc

ocinon commented 1 month ago

@StephenHodgson should we push it into the development branch for now? Could you open that one for me?

StephenHodgson commented 1 month ago

Sure I'll push a development branch right now for you to target :)

StephenHodgson commented 1 month ago

you may want to rebase your changes tho and just make sure you've synced with upstream.

ocinon commented 1 month ago

It's up to date but not rebased. One sec.

ocinon commented 1 month ago

Done