gemelo-ai / vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
https://gemelo-ai.github.io/vocos/
MIT License
827 stars 95 forks source link

Training vocos on a single speaker dataset #46

Closed bharathraj-v closed 8 months ago

bharathraj-v commented 9 months ago

Hi,

I'm looking to train on a single-speaker dataset similar to LJSpeech, and I'm looking for guidance. I have a few questions.

Has any experimentation been done on single-speaker datasets such as LJSpeech with vocos and if so, what were the metrics at convergence? How many steps do I train for for a single-speaker dataset? Also, what metrics do I focus on to tell if the model has converged?

Any help regarding this would be very valuable to me.

Thanks!