auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
https://arxiv.org/abs/1905.05879
MIT License
990 stars 205 forks source link

Pretrained model for speaker encoder #99

Closed anitaweng closed 3 years ago

anitaweng commented 3 years ago

Hi, it is a nice work.

I am trying to reproduce the result of vctk. I want to ask about if the pretrained model of speaker encoder provided in your readme is trained on the combination of VoxCeleb1 and Librispeech as mentioned in your paper, or is just trained on a small set of vctk speakers? Also, is the pretrained model of wavenet vocoder trained on the whole vctk dataset or only a small part of it?

Thank a lot!

auspicious3000 commented 3 years ago

The GE2E wouldn't work well if trained on small dataset.

anitaweng commented 3 years ago

So the provided pretrained speaker encoder is trained on the combination of VoxCeleb1 and Librispeech, and the provided wavenet vocoder is just trained on vctk. Am I right?

anitaweng commented 3 years ago

Thanks for your reply. Then If I train a new autovc model with the pretrained speaker encoder and pretrained wavenet model on vctk dataset with your original setting, can I reproduce a well results as well as yours for unseen vctk speakers?

auspicious3000 commented 3 years ago

Ideally yes.

anitaweng commented 3 years ago

Thanks for your reply.