auspicious3000 / SpeechSplit

Unsupervised Speech Decomposition Via Triple Information Bottleneck
http://arxiv.org/abs/2004.11284
MIT License
636 stars 92 forks source link

Question: how many speakers was trained in pre-trained model? #42

Closed Kurei-Fujiwara closed 1 year ago

Kurei-Fujiwara commented 3 years ago

Hi, thanks for the great work. Sorry for the rudimentary question. I have a question about the pre-trained model in demo.ipybn. In the paper, it says that it was trained by 20 speakers, but the speaker ID vector used in demo.ipynb has a size of 82, and it looks like it has information for 82 speakers. Please tell me how many speakers were used in the pre-trained model and why the speaker ID in demo.ipynb has a size of 82.

auspicious3000 commented 1 year ago

The speaker emb has 82 dims because there are 82 speakers in the dataset. Only 20 of them were used.