Output is a bit shaky, how to fix that?

bshall / knn-vc

Voice Conversion With Just Nearest Neighbors

Other

431 stars 64 forks source link

Thanks for your interest in our work -- there could be many ways to improve things, I'm not sure on what the best way is to make it better. For some of your suggestions, here's how I predict they might change the output:

If you use HiFiGAN V2, I think it will do worse than HiFiGAN V1. V2 has much fewer parameters and scores much worse in the original HiFiGAN paper, so I do no think using V2 will improve it.
Like with just about every model, if you train on more data it will probably improve things.

The training time for training on train-100 data from librispeech took around a week or two on 3x Quadro RTX 6000 GPUs. I hope that helps!

bshall / knn-vc

Output is a bit shaky, how to fix that? #9