How can I calculate suitable parameters?

Wendison / VQMIVC

Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!

MIT License

340 stars 55 forks source link

Hi, for your questions: (1) 2000 is determined by multiple trials, and 2000 achieves best performance; (2) I'm not sure whether increasing the batch size can give better results, but first of all, I think since the number of speakers for training is so large, you may need to keep data balance, i.e., similar number of utterances per speaker. Besides, you need to make sure that the audio is relatively clean without too much noise or reverberation, else the speaker encoder may learn those harmful information. Based on my experience, you can use a pre-trained speaker encoder (e.g., trained for speaker recognition task) instead of training the speaker encoder from scratch, this way can stabalize the training process and also improve the conversion performance when the number of training speakers is large.

Wendison / VQMIVC

How can I calculate suitable parameters? #22