Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
MIT License
283
stars
40
forks
source link
Train custom voice instead of the default ljs speaker. #31
I am attempting to train a custom speaker to be used with the provided inference script, replacing the default ljs speaker. However, I've encountered an issue where the inference outputs are muffled, similar to the problem described in this issue. I'm uncertain about the appropriate course of action. In my current training pipeline, I train the decoder, and the checkpoints are saved to /decoder_checkpoints/ag_decoder. Subsequently, I perform a Warm start training for the dap model in the directory /dap_checkpoints/rad_ag. During inference, I use the rad_ag checkpoint as the rad_tts checkpoint and utilize the provided vocoder checkpoint hifigan_libritts100360_generator0p5.pt. As a result, the ag_decoder checkpoint seems to be unused. Am I making a mistake in my approach? Should I train the decoder and the dap on the same checkpoint path? You can refer to this colab notebook for more details. I would greatly appreciate your guidance through the process or any relevant documentation you can provide.
I am attempting to train a custom speaker to be used with the provided inference script, replacing the default ljs speaker. However, I've encountered an issue where the inference outputs are muffled, similar to the problem described in this issue. I'm uncertain about the appropriate course of action. In my current training pipeline, I train the decoder, and the checkpoints are saved to
/decoder_checkpoints/ag_decoder
. Subsequently, I perform a Warm start training for the dap model in the directory/dap_checkpoints/rad_ag
. During inference, I use therad_ag
checkpoint as therad_tts
checkpoint and utilize the provided vocoder checkpointhifigan_libritts100360_generator0p5.pt
. As a result, theag_decoder
checkpoint seems to be unused. Am I making a mistake in my approach? Should I train the decoder and the dap on the same checkpoint path? You can refer to this colab notebook for more details. I would greatly appreciate your guidance through the process or any relevant documentation you can provide.