enricoros / big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
https://big-agi.com
MIT License
5.62k stars 1.3k forks source link

Add TTS vendor: Play.ht #214

Open enricoros opened 1 year ago

enricoros commented 1 year ago

As per the title. Would require an abstraction to select the current Audio generation model (between ElevenLabs, OpenAI TTS, and this).

Probably the abstraction will be large and needed. See #213 and #205 to make sure the requirements there are respected by this new implementor.

dagelf commented 4 months ago

Also locally hosted options eg. mimic3 - which despite being "deprecated" is the most userfriendly and easy to set up neural speech engine https://community.openconversational.ai/t/mimic-3-tts-models-failing-to-load-with-invalid-protobuf-error/15164/2

Most of them are easy to set up, but this one is literally just a pip install and a model download - and with a bit of trouble that can be reduced to just a pip install.

Related to this, is speech to text, and here too there are outstanding locally hosted options.

dagelf commented 4 months ago

Damn bro, you could use some extra pairs of hands. I'm only noticing the dates now.... let get my toes wet. 👷

enricoros commented 4 months ago

Damn bro, you could use some extra pairs of hands. I'm only noticing the dates now.... let get my toes wet. 👷

You're very welcome to contribute! If there's something that's really pressing to you and you love to see and use in the app, just download the code, do it, and push a PR!