Closed Selectorrr closed 1 year ago
Hi @Selectorrr, the masking conditioning is used by default using this recipe.
I think the repetition issues are related to overfitting. If you train the model too much for a small dataset the model easily overfit. You should stop the training at 1~3 epochs of training. I recommend you try an early checkpoint.
@Edresson You may be right, although I've tried it on a small number of epochs, my dataset is quite large. I'll try it on a dataset of about 2 hours with not too many epochs, I'll write about the results later.
@Edresson Yes indeed if you use about 2 hours of audio and teach 1 epoch there are no problems and everything works well. I was originally planning to train the model on multiple speakers and a larger dataset, will wait for updates, thanks a lot you guys are awesome.
Describe the bug
If we follow the instructions in this recipe, the trained model starts to have a problem with the recurrence of prompts in the output audio.
As noted on @erogol's blog in order to fix this problem in the model xtts v1.1:
It remains unclear how to use the same conditioning method when training a model using a recipe.
To Reproduce
Training a model using a recipe
Environment