Open QaisarRajput opened 1 year ago
JFYI, For now sampling rate is the only thing which can tune this a little, Higher gives you deeper voice (slower) while lower number give thinner voice (faster).
@QaisarRajput For now, controllable generation (e.g., change gender, emotion, etc) is not supported yet. You could consider cascading the MMS TTS model with an off-the-shelf voice cloning model to achieve this.
@QaisarRajput For now, controllable generation (e.g., change gender, emotion, etc) is not supported yet. You could consider cascading the MMS TTS model with an off-the-shelf voice cloning model to achieve this.
Could you please name one voice cloning repo on vits to achieve this? I find out that directly fine-tuning on Korean model makes very bad results.
Not sure how this would work, but here is one example for voice conversion.
I suggest looking into Coqui which has recipes for using MMS-TTS (FairSeq) alongside voice cloning; I've used it successfully for gender.
Regarding emotion, etc. Bark looks promising, but I haven't tested it yet.
Bark seems to be very slow, albeit more powerful.
❓ Questions and Help
Before asking:
What is your question?
I am using the MMS TTS and its amazing. So far for one language (eng) there is one speakers voice. Are there any parameters or random seeds which can be changed to have an entire different persons voice, without fine-tuning? Even if we cant do emotions or lets say voice pitch etc. but can it be done where we just have a random new naturally sounding person?
Code
What have you tried?
MMS TTS and Hugginface mms-tts
What's your environment?
pip
, source): pip