SayaSS / vits-finetuning

Fine-Tuning your VITS model using a pre-trained model
MIT License
551 stars 86 forks source link

Recommended tips for finetuning #37

Open Koeru opened 1 year ago

Koeru commented 1 year ago

Hi, I tried to finetune the model with my own voice.

I recorded 100 datas to finetuning.

It generate my voice, but the intonation seems not as perfect as the original pretrained models.

Do you have any tips for preparing data or finetune ?

SayaSS commented 1 year ago

Bad intonation is caused by rough phoneme annotation, but it will take a lot of time to correct. So I don't recommend using this repository to fine-tune the voices of real speakers.

You could use so-vits-svc or RVC to train a voice conversion model, then use Microsoft TTS as input to achieve another sense of tts, the effect will be much better

demo: https://huggingface.co/spaces/zomehwh/rvc-models

Koeru commented 1 year ago

Thank you very much!!

Koeru commented 1 year ago

Hello! Thank you very much for your advice. I would like to ask one more questions if you don't mind🙇

I've used the method above, the output is very stable! However, do we have more ways to improve natural intonation and put emotion in the speech like this vits model??

I'm recording our own voice with voice actors, but would like to use the voice in more character speech situations.

Thank you

SayaSS commented 1 year ago

I'm sorry I can't provide better advice, perhaps you could try seeking advice from https://github.com/VOICEVOX/voicevox

Koeru commented 1 year ago

Got it. Thank you very much anyway

qw4654134 commented 1 year ago

Can the pre-trained model fine tuned for chinese? thanks!