Closed Ca-ressemble-a-du-fake closed 4 months ago
Yes, those numbers should work, if the learningrate is dropped to 1/10 of what it usually is I would say. I would however never finetune the vocoder (it should already work on unseen speakers just as well as on seen speakers, and finetuning a GAN is very tricky).
Also for the FastPitch finetuning steps, I would use no more than 20k steps, regardless of how many minutes are available. At this point it would probably better to train from scratch than to finetune.
Ok thank you. So should I change the learning rate in the fine tuning script or it already takes the 1/10 factor into account ?
the current version does not change the learning rate in the finetuning script. This is because the ideal learning rate for your finetuning data is highly dependent on the amount of datapoints used for finetuning. For few datapoints, lower finetuning learning rates are needed, but for lots of datapoints, one can use the original learning rate without problems.
Hi,
Given a target speaker dataset what is roughly the number of fine tuning steps that should be undergone ?
NeMo "recommends 1000 steps per minute of audio for fastpitch and 500 steps per minute of audio for HiFi-GAN."
Can the same general recommendation also apply for Toucan TTS when fine tuning Meta pretrained model on given dataset ? The goal being to find the sweet spot before overfitting appears.
Any advice appreciated,
Thanks in advance