NVIDIA / radtts

Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
MIT License
283 stars 40 forks source link

Train custom voice instead of the default ljs speaker. #31

Open rajdeep1337 opened 11 months ago

rajdeep1337 commented 11 months ago

I am attempting to train a custom speaker to be used with the provided inference script, replacing the default ljs speaker. However, I've encountered an issue where the inference outputs are muffled, similar to the problem described in this issue. I'm uncertain about the appropriate course of action. In my current training pipeline, I train the decoder, and the checkpoints are saved to /decoder_checkpoints/ag_decoder. Subsequently, I perform a Warm start training for the dap model in the directory /dap_checkpoints/rad_ag. During inference, I use the rad_ag checkpoint as the rad_tts checkpoint and utilize the provided vocoder checkpoint hifigan_libritts100360_generator0p5.pt. As a result, the ag_decoder checkpoint seems to be unused. Am I making a mistake in my approach? Should I train the decoder and the dap on the same checkpoint path? You can refer to this colab notebook for more details. I would greatly appreciate your guidance through the process or any relevant documentation you can provide.