FlorianEagox / WeeaBlind

A program to dub non-english media with modern AI speech synthesis, diarization, and voice cloning!
https://tessapainter.com/project/WeeaBlind
283 stars 26 forks source link

[Enhancement] New fully open source TTS with steerable voice characteristics #21

Open phirsch opened 7 months ago

phirsch commented 7 months ago

Just wanted to bring this new TTS library+model to your attention which allows voice characteristics to be steered via a separate prompt:

https://github.com/huggingface/parler-tts (impressive demos on the HF space linked there).

Afterthought: Wondering whether an LLM might be able to derive such prompts from a pure text transcript...

FlorianEagox commented 7 months ago

Ooh thanx so much for sharing this with me! I will look into it and consider integrating it if it's a good fit!

phirsch commented 7 months ago

FYI: mkiol/dsnote/issues/122 might be relevant and unfortunately limit the usefulness of this model until huggingface/parler-tts/issues/11 is fixed/implemented.

Feel free to close the issue if you prefer.

FlorianEagox commented 7 months ago

Thanks again! I'll leave it open to remember to check out this project from time to time. <3

MethanJess commented 6 months ago

@FlorianEagox there are also other really cool TTS models you could implement if you ever get the chance to

phirsch commented 1 month ago

And there is another new steerable open source model which looks promising (and even seems to support translation internally, but only EN/CN for now):

https://github.com/SWivid/F5-TTS

MethanJess commented 3 weeks ago

Honestly, i really loved the new GPTSoVits V2, it also has really fast generations