Open PylotLight opened 9 months ago
Thanks, this is a good request @PylotLight. I think the next support should come through the LocalAI TTS models.
What are your favorite options for alternative TTS?
So we obviously have a goal of supporting as much as possible with as little integration work as possible right?
How generic can we make it in terms of providing tts endpoint and getting back streamed audio?
From Discord
Note Azure has some decent TTS/SST free API options as well just as another option to add to the list: https://azure.microsoft.com/en-us/products/ai-services/text-to-speech
Just a +1 for this. If you do end up supporting localai, it's worth noting that the tts endpoint is openai api compatible, although there is an additional backend field that can be specified in order to change the tts system. It defaults to piper which is a fast tts system with basic quality. Although I'm not a developer so unfortunately can't help implement this, I would be happy to test/provide feedback if that is useful. Thanks for your work on this project, I've only recently discovered it and am impressed so far.
The only thing I'd want to note is that there are other options than localai, so while defs a good option to support, a custom openai compatible endpoint should be allowed as well. Tldr make a custom openai spec tts endpoint with custom url so any openai server can be consumed here.
I second Azure TTS. They have extensive language support, it's inexpensive and their TTS's quality is subpar only to 11labs.
I wanted to use this for conversations for language learning but 11labs still doesn't support thai... :/
Why Currently TTS support is hardcoded for Eleven labs only, instead of allowing a generic /TTS input which returns an audio file/stream which can be used and sourced from any other self-hosted or external provider. While this request is focused on a self hosted setup, this would work for non-local use as well.
Description In the voice settings menu, allow "custom" TTS endpoint where you would put e.g http://localhost/tts perhaps with a body/payload param settings which can be custom sent at runtime to the provider. Then consume the returned audio file and use as normal.
Requirements If you can, Please break-down the changes use cases, UX, technology, architecture, etc.