Could you tell me why use speaker id?

NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data

BSD 3-Clause "New" or "Revised" License

855 stars 183 forks source link

Open Moon-sung-woo opened 3 years ago

Moon-sung-woo commented 3 years ago

First of all Thank you for your code.

I read the paper and analyzed the code. But i could't understand why use speaker id.

Do I need a speaker ID to train the features? Or is it used to extract features when inference?
Does speaker id help express emotion?
Do you use speaker id to distinguish the characteristics of the speaker?

I'd appreciate it if you could give me an answer.