bshall / knn-vc

Voice Conversion With Just Nearest Neighbors
https://bshall.github.io/knn-vc/
Other
431 stars 64 forks source link

Output is a bit shaky, how to fix that? #9

Closed Souvic closed 1 year ago

Souvic commented 1 year ago

Thanks for the great work and making code with all weights available! Really appreciate it..

Can you please guide me on how to improve the output further? If we change the vocoder to HiFIGAN V2 or train on more data, how do you think output will change?

Also, how much time does it take to train on train-100 data from librispeech?

RF5 commented 1 year ago

Thanks for your interest in our work -- there could be many ways to improve things, I'm not sure on what the best way is to make it better. For some of your suggestions, here's how I predict they might change the output:

The training time for training on train-100 data from librispeech took around a week or two on 3x Quadro RTX 6000 GPUs. I hope that helps!