begeekmyfriend / tacotron2

Forked from NVIDIA/tacotron2 and merged with Rayhane-mamah/Tacotron-2
BSD 3-Clause "New" or "Revised" License
80 stars 38 forks source link

Results of multi-speaker training #32

Open Ranzige opened 3 years ago

Ranzige commented 3 years ago

First of all, thank you for your excellent work,The data set I currently have contains 8 speaker, each of which is expected for about 20 minutes. Train according to your method and the results of model validation are poor. IMG_20201204_111058 Can you help me see why?

begeekmyfriend commented 3 years ago

20min for each speaker corpus seems too little for deep learning. I suggest 2 hours at least.

Ranzige commented 3 years ago

Thank you very much for your reply, the current audio length of each speaker 40 minutes, about 170 speaker, but I just want to train a few of them, what method can be used to other not used speaker audio to be better training results?

begeekmyfriend commented 3 years ago

Unseen speaker TTS is another model that is out of range of this project...