[Roadmap] Use OpenAI TTS models for speech generation to avoid having too many API keys

enricoros / big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

https://big-agi.com

MIT License

5.62k stars 1.3k forks source link

[Roadmap] Use OpenAI TTS models for speech generation to avoid having too many API keys #590

Open nick-harder opened 4 months ago

nick-harder commented 4 months ago

Why Having to register additionally with ElevenLabs for voice generation is a hussle. Also, they don't provide "pay as you go" plans. OpenAI has pretty good speech generation model TTS-1, and it can be used directly with the openAI API key, thus simplifying the setup process and the comfort of use.

Description Use OpenAI TTS model by default if OpenAI API key is provided. Make ElevenLabs optional if API key is added, and allow to select which one to use (similar to the image generation selection).

enricoros commented 4 months ago

Thanks @nick-harder this will require some abstraction to switch and configure the models (and add/remove them dynamically alongside model providers, as one could have multiple openai set up for instance).

Code is welcome here, I'm not sure I can tackle this on the short term as I'm implementing the full multimodal pipeline now.