Open StephenHodgson opened 6 months ago
@StephenHodgson did you start implementing WebSockets by any chance? Also, I saw the speech-to-speech model in your 3.0.0 draft, but there is no support yet, correct?
Yes I was already doing this for the unity package and was considering porting it once done
@StephenHodgson I couldn't find any previous WebSocket implementation in your Unity repo. As I needed it, I implemented it for the DotNet version here: ocinon/ElevenLabs-DotNet@93457e124ed0397bf3532c6fd2b62c9188406d41
It extends the client slightly and tries to pick up the same patterns the repo used before. It lacks proxy support and tests. If you have any notes, let me know.
@StephenHodgson I couldn't find any previous WebSocket implementation in your Unity repo. As I needed it, I implemented it for the DotNet version here: ocinon/ElevenLabs-DotNet@93457e1
It extends the client slightly and tries to pick up the same patterns the repo used before. It lacks proxy support and tests. If you have any notes, let me know.
Feel free to open a pull request!
Only feedback is to rebase on the development branch
Any updates on this? It would be very useful in a project I'm part of.
Sorry for never updating the thread. After some back-and-forth with ElevenLabs support, it turned out that their WebSocket implementation has a 20-second timeout. This is fine for batch conversions but makes it pretty useless for low-volume or prototyping voice-to-voice bots or similar use cases.
It might be possible to keep sending a space string (" ") as a keep-alive signal, but I stopped spending more time on it, as during testing, I didn't get speed increases compared to the REST API (but I didn't do proper testing). The code exists, and I could push it for reference.
Thanks for the quick response!
Well that's disappointing, but thanks for doing the legwork.
I'm gonna do some testing on my own, so please push the code.
It's here ocinon/ElevenLabs-DotNet
I updated it to the latest ElevenLabs version. Keep-alive messages don't seem to work. BUT the ElevenLabs support just told me that they added an "inactivity timeout" that raises the timeout to up to 180 seconds. I added it to the code. Happy testing!
Some basic testing code:
using ElevenLabsClient client = new(ELEVEN_LABS_KEY);
await using FileStream fileStream
= new("output.mp3", FileMode.Create, FileAccess.Write, FileShare.Read);
await client.TextToSpeechWebSocketEndpoint.StartTextToSpeechAsync(
Voice.Arnold, (async voiceClip =>
{
if (voiceClip == null)
{
Console.WriteLine("Received null voice clip.");
return;
}
Console.WriteLine(
$"Received voice clip with {voiceClip.ClipData.Length} bytes.");
await fileStream.WriteAsync(voiceClip.ClipData);
}),
null, null, Model.TurboV2_5, OutputFormat.MP3_44100_128, null, null, null, 180);
while (true)
{
Console.Write("Enter text to convert to speech: ");
string? text = Console.ReadLine();
if (text is null) { continue; }
if (text == "exit") { break; }
bool? flush = text == "flush" ? true : null;
bool trigger = text == "trigger";
string prompt = text is "flush" or "trigger" ? "." : text;
await client.TextToSpeechWebSocketEndpoint.SendTextToSpeechAsync(prompt, flush, trigger);
}
await client.TextToSpeechWebSocketEndpoint.EndTextToSpeechAsync();
@ocinon feel free to open a PR on the main project for everyone else to get :)
I've also been playing with the websocket support for my OpenAI-DotNet project and will likely port over some stuff from there as well, esp around the web socket client. Just a bit of an abstraction layer to help keep the socket alive, and listening, etc
@StephenHodgson should we push it into the development branch for now? Could you open that one for me?
Sure I'll push a development branch right now for you to target :)
you may want to rebase your changes tho and just make sure you've synced with upstream.
It's up to date but not rebased. One sec.
Done
Support websockets for text to speech
ElevenLabs-DotNet-Proxy should also support forwarding websockets connections