Open baljeetrathi opened 2 years ago
Hi @sveneschlbeck Would it be possible for you to guide me here?
Thanks. :)
Hi @raccoonML and @ireneb612 you guys also seem to know how to train a model yourself. Could you help me here?
Thanks. :)
The thing that I would do is to use the pretrained models that work good for english and then finetune on 12 minutes of your voice! You just have to put the data in the right ormat and run the synthesizer_preprocess_audio the syhtnesizer_preprocess_embeds and the synthesizer_train.
I personally used the repository with the older direcotry set up for the saved models, but it's not a big difference, just the path to the saved models now are all in the same directory.
Thank you very much @ireneb612 . :)
I cloned the repo and my current directory structure is like this:
encoder
samples
synthesizer
toolbox
vocoder
synthesizer_preprocess_audio.py
synthesizer_preprocess_embeds.py
synthesizer_train.py
etc.
Issue 437 mentions the following directions for training:
Here is a [preprocessed p240 dataset](https://www.dropbox.com/s/qskoopjcdjdwuvw/dataset_p240.zip?dl=0) if you would like to repeat this experiment. The embeds for utterances 002-380 are overwritten with the one for 001, as the hardcoding makes for a more consistent result. Use the audio file p240_001.flac to generate embeddings for inference. The audios are not included to keep the file size down, so if you care to do vocoder training you will need to get and preprocess VCTK.
Directions:
Copy the folder synthesizer/saved_models/logs-pretrained to logs-vctkp240 in the same location. This will make a copy of your pretrained model to be finetuned.
Unzip the dataset files to datasets_p240 in your Real-Time-Voice-Cloning folder (or somewhere else if you desire)
Train the model: python synthesizer_train.py vctkp240 dataset_p240/SV2TTS/synthesizer --checkpoint_interval 100
Let it run for 200 to 400 iterations, then stop the program.
This should complete in a reasonable amount of time even on CPU.
You can safely stop and resume training at any time though you will lose all progress since the last checkpoint
Test the finetuned model in the toolbox using dataset_p240/p240_001.flac to generate the embedding
but the link no longer works so I couldn't figure out the proper format for the files. Could you please help me with that?
Thanks again. :)
https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/819#issue-970736011
I used this ussue to preprocess the Mozilla Common voice dataset!
I've made public a repo with a workflow for creating a dataset to perform synthesizer fine tuning.
Not sure if this is the best place to let people know, but hopefully it helps someone.
Hi,
I want to use the trainer for cloning only my voice. The language would still be English but a different accent than the pre-trained models. Will the instructions mentioned here still work to get good results: https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/437?
I have a few more questions to get started.
Are the instructions mentioned in 437 no longer valid?
Is that sufficient for training a single voice model?
Thanks. :)