Closed smlim01 closed 1 year ago
You don't need additional steps for multi-speaker training if you input WORLD features into the neural vocoder. The WORLD features include speaker information sufficiently so the neural vocoder can generate voices without any conditioning on speakers. Also, you can compute stats with overall speakers.
Thank you! It was simple question, so I close this issue.
Hello. I am trying to train the pretrained model on many-speaker dataset like VCTK.
I am going to prepare a .scp file, extract features, and train as you explained on Readme.
Is there an additional step for training many speakers? For example, prepare a speaker label and calculate each speaker's stats. etc.