Open Ranzige opened 3 years ago
20min for each speaker corpus seems too little for deep learning. I suggest 2 hours at least.
Thank you very much for your reply, the current audio length of each speaker 40 minutes, about 170 speaker, but I just want to train a few of them, what method can be used to other not used speaker audio to be better training results?
Unseen speaker TTS is another model that is out of range of this project...
First of all, thank you for your excellent work,The data set I currently have contains 8 speaker, each of which is expected for about 20 minutes. Train according to your method and the results of model validation are poor. Can you help me see why?