Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
radtts and radtts++ both have 1st stage training(without the attribute predictor), I wonder how many steps of 1st stage is proper?
Can anyone recommend a value?
radtts and radtts++ both have 1st stage training(without the attribute predictor), I wonder how many steps of 1st stage is proper? Can anyone recommend a value?