Training vocos on a single speaker dataset

Hi,

I'm looking to train on a single-speaker dataset similar to LJSpeech, and I'm looking for guidance. I have a few questions.

Has any experimentation been done on single-speaker datasets such as LJSpeech with vocos and if so, what were the metrics at convergence? How many steps do I train for for a single-speaker dataset? Also, what metrics do I focus on to tell if the model has converged?

Any help regarding this would be very valuable to me.

Thanks!

gemelo-ai / vocos

Training vocos on a single speaker dataset #46