[Roadmap] Add support for more/generic TTS output sources

enricoros / big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

https://big-agi.com

MIT License

5.67k stars 1.31k forks source link

[Roadmap] Add support for more/generic TTS output sources #451

Open PylotLight opened 9 months ago

PylotLight commented 9 months ago

Why Currently TTS support is hardcoded for Eleven labs only, instead of allowing a generic /TTS input which returns an audio file/stream which can be used and sourced from any other self-hosted or external provider. While this request is focused on a self hosted setup, this would work for non-local use as well.

Description In the voice settings menu, allow "custom" TTS endpoint where you would put e.g http://localhost/tts perhaps with a body/payload param settings which can be custom sent at runtime to the provider. Then consume the returned audio file and use as normal.

Requirements If you can, Please break-down the changes use cases, UX, technology, architecture, etc.

[ ] Add new menu option to voice settings menu
[ ] Unhardcode 11ai support and change to generic TTS stream/static audio file format which can be consumed generically.
[ ] Based on selection in voice menu, use the relevant provider to generate audio.

enricoros commented 8 months ago

Thanks, this is a good request @PylotLight. I think the next support should come through the LocalAI TTS models.

What are your favorite options for alternative TTS?

[ ] LocalAI
[ ] Browser default TTS (Web Speech API)
[ ] OpenAI tts
[ ] Play.ht
[ ] ..?

PylotLight commented 8 months ago

So we obviously have a goal of supporting as much as possible with as little integration work as possible right?

How generic can we make it in terms of providing tts endpoint and getting back streamed audio?

enricoros commented 8 months ago

From Discord

PylotLight commented 8 months ago

Note Azure has some decent TTS/SST free API options as well just as another option to add to the list: https://azure.microsoft.com/en-us/products/ai-services/text-to-speech

danielw97 commented 6 months ago

Just a +1 for this. If you do end up supporting localai, it's worth noting that the tts endpoint is openai api compatible, although there is an additional backend field that can be specified in order to change the tts system. It defaults to piper which is a fast tts system with basic quality. Although I'm not a developer so unfortunately can't help implement this, I would be happy to test/provide feedback if that is useful. Thanks for your work on this project, I've only recently discovered it and am impressed so far.

PylotLight commented 6 months ago

The only thing I'd want to note is that there are other options than localai, so while defs a good option to support, a custom openai compatible endpoint should be allowed as well. Tldr make a custom openai spec tts endpoint with custom url so any openai server can be consumed here.

tassa-yoniso-manasi-karoto commented 4 months ago

I second Azure TTS. They have extensive language support, it's inexpensive and their TTS's quality is subpar only to 11labs.

I wanted to use this for conversations for language learning but 11labs still doesn't support thai... :/