FlorianEagox / WeeaBlind

A program to dub non-english media with modern AI speech synthesis, diarization, and voice cloning!
https://tessapainter.com/project/WeeaBlind
246 stars 23 forks source link

[Enhancement] New fully open source TTS with steerable voice characteristics #21

Open phirsch opened 3 months ago

phirsch commented 3 months ago

Just wanted to bring this new TTS library+model to your attention which allows voice characteristics to be steered via a separate prompt:

https://github.com/huggingface/parler-tts (impressive demos on the HF space linked there).

Afterthought: Wondering whether an LLM might be able to derive such prompts from a pure text transcript...

FlorianEagox commented 3 months ago

Ooh thanx so much for sharing this with me! I will look into it and consider integrating it if it's a good fit!

phirsch commented 3 months ago

FYI: mkiol/dsnote/issues/122 might be relevant and unfortunately limit the usefulness of this model until huggingface/parler-tts/issues/11 is fixed/implemented.

Feel free to close the issue if you prefer.

FlorianEagox commented 3 months ago

Thanks again! I'll leave it open to remember to check out this project from time to time. <3

MethanJess commented 2 months ago

@FlorianEagox there are also other really cool TTS models you could implement if you ever get the chance to