I'm looking to train on a single-speaker dataset similar to LJSpeech, and I'm looking for guidance. I have a few questions.
Has any experimentation been done on single-speaker datasets such as LJSpeech with vocos and if so, what were the metrics at convergence? How many steps do I train for for a single-speaker dataset? Also, what metrics do I focus on to tell if the model has converged?
Any help regarding this would be very valuable to me.
Hi,
I'm looking to train on a single-speaker dataset similar to LJSpeech, and I'm looking for guidance. I have a few questions.
Has any experimentation been done on single-speaker datasets such as LJSpeech with vocos and if so, what were the metrics at convergence? How many steps do I train for for a single-speaker dataset? Also, what metrics do I focus on to tell if the model has converged?
Any help regarding this would be very valuable to me.
Thanks!