[Enhancement] New fully open source TTS with steerable voice characteristics

FlorianEagox / WeeaBlind

A program to dub non-english media with modern AI speech synthesis, diarization, and voice cloning!

https://tessapainter.com/project/WeeaBlind

246 stars 23 forks source link

[Enhancement] New fully open source TTS with steerable voice characteristics #21

Open phirsch opened 3 months ago

phirsch commented 3 months ago

Just wanted to bring this new TTS library+model to your attention which allows voice characteristics to be steered via a separate prompt:

https://github.com/huggingface/parler-tts (impressive demos on the HF space linked there).

Afterthought: Wondering whether an LLM might be able to derive such prompts from a pure text transcript...

FlorianEagox commented 3 months ago

Ooh thanx so much for sharing this with me! I will look into it and consider integrating it if it's a good fit!

phirsch commented 3 months ago

FYI: mkiol/dsnote/issues/122 might be relevant and unfortunately limit the usefulness of this model until huggingface/parler-tts/issues/11 is fixed/implemented.

Feel free to close the issue if you prefer.

FlorianEagox commented 3 months ago

Thanks again! I'll leave it open to remember to check out this project from time to time. <3

MethanJess commented 2 months ago

@FlorianEagox there are also other really cool TTS models you could implement if you ever get the chance to

Metavoice: a very realistic and emotional tts that can also clone a voice with one shot or finetuning, but it requires at least 12gb of vram
MeloTTS: the results are kinda realistic and emotional, the audio quality is also really nice, it also is very lightweight so it can generate very long sentences in less than a second, it also has finetuning support
OpenVoice V2: Pretty much just melotts but with one shot voice cloning support, (it sounds worse than melo in my opinion). here's a demo: https://huggingface.co/spaces/myshell-ai/OpenVoiceV2