CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
52.05k stars 8.71k forks source link

Is there any benefit to doing the training myself? #1107

Open CodingRox82 opened 2 years ago

CodingRox82 commented 2 years ago

I see in the Wiki that there is a guide on how to train the models myself. For this, I would need around 500GB of data. For what I'm trying to do this might be impractical. What is the benefit of doing the training myself? All I want to do is use the program to generate speech for 3 voices of my choosing via code (i.e. not using the toolbox). Would I need to retrain the 3 models in order to do this?

I'm new to ML so sorry if this seems like a rookie question.

raccoonML commented 2 years ago

Finetune the synthesizer model on your own dataset. You don't have to train the encoder or vocoder. Instead of training from scratch, resume the training on pretrained models so they will work better with your voice.

Before finetuning, you should get experience training a synthesizer from scratch, using a dataset that is known to work. It helps a lot.

tdlio commented 1 year ago

@raccoonML I’m coming upon this now as I’m trying to learn this for the purposes of getting something working to a ElevenLabs level. Where would you start to learn these example projects ? Voice seems to be a lesser covered / lesser resourced topic in AI so appreciate anything you found helpful that you’d recommend.