auspicious3000 / SpeechSplit

Unsupervised Speech Decomposition Via Triple Information Bottleneck
http://arxiv.org/abs/2004.11284
MIT License
636 stars 92 forks source link

Is it very slow for the Wavenet vocoder to synthesize a voice #69

Closed hhhuazi closed 1 year ago

hhhuazi commented 1 year ago

Hello, I use demo.ippynb synthesizes voice from mel, it takes 5 minutes to synthesize a voice. Isn't this too slow? Can I use HifiGAN's pre training model directly?Thank you for your answer!

auspicious3000 commented 1 year ago

yes, that's the purpose

hhhuazi commented 1 year ago

Thank you for your answer! I found the pre training model of hifiGAN in github and added it, but the synthesized voice has no content, such as noise. Why? Do I need to use the VCTK dataset to train the HiFiGAN vocoder again? Does the dataset need to be divided?

auspicious3000 commented 1 year ago

Yes. But you can also use the hifigan model under autovc or autopst.

hhhuazi commented 1 year ago

Are there any precautions for retraining vocoder?

auspicious3000 commented 1 year ago

It should be straightforward.