Train custom voice instead of the default ljs speaker.

I am attempting to train a custom speaker to be used with the provided inference script, replacing the default ljs speaker. However, I've encountered an issue where the inference outputs are muffled, similar to the problem described in this issue. I'm uncertain about the appropriate course of action. In my current training pipeline, I train the decoder, and the checkpoints are saved to /decoder_checkpoints/ag_decoder. Subsequently, I perform a Warm start training for the dap model in the directory /dap_checkpoints/rad_ag. During inference, I use the rad_ag checkpoint as the rad_tts checkpoint and utilize the provided vocoder checkpoint hifigan_libritts100360_generator0p5.pt. As a result, the ag_decoder checkpoint seems to be unused. Am I making a mistake in my approach? Should I train the decoder and the dap on the same checkpoint path? You can refer to this colab notebook for more details. I would greatly appreciate your guidance through the process or any relevant documentation you can provide.

NVIDIA / radtts

Train custom voice instead of the default ljs speaker. #31