SayaSS / vits-finetuning

Fine-Tuning your VITS model using a pre-trained model
MIT License
551 stars 86 forks source link

Recommended number of steps? #36

Open hopto-dot opened 1 year ago

hopto-dot commented 1 year ago

About 50 audio-text pairs will suffice and 100-600 epochs could have quite good performance, but more data may be better.

600 epochs with what batch size? 1? the default 16 in the Google Colab?

Better yet, do you know of a good rough formula for steps based on audio samples?

SayaSS commented 1 year ago

In this repo, steps * batch size = total number of samples * epochs, If the total number of samples and epochs remain unchanged, changing the batch size will also change the steps, so I chose epochs as the reference for the progress of fine-tuning. For most cases, a batch size of 16 is ok.