NVIDIA / radtts

Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
MIT License
283 stars 40 forks source link

Training for singing models #29

Open sjkoelle opened 1 year ago

sjkoelle commented 1 year ago

We are trying to train a singing model. We are satisfied with the timbre of the sound being produced through the decoder - it sounds like singing, at least using ground truth features from the training data. However, the lyrics are typically not recognizable, at least with the amount of training that typically generates recognizable speech from text. We know that the phoneme encodings are reasonable since we can train text to speech models, and have tried warmstarting from a text to speech model. Have you trained a singing model, and what sort of data / training curriculum did you use? Thanks!