Wendison / VQMIVC

Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!
MIT License
340 stars 55 forks source link

Mel stats and Vocoder #32

Open winddori2002 opened 2 years ago

winddori2002 commented 2 years ago

Hi, I try to reproduce your paper and I encounter a problem with mel stats and vocoder. When I use your pre-trained vocoder and mel stats, I can notice the speech synthesis performance is quite good. However, when I run the preprocess code and get new mel stats, the speech synthesis performance degrades on the same pre-trained vocoder. Thus, the questions are as below:

1.) I wonder if I get new mel stats, it is necessary to train the vocoder again. 2.) I wonder if you use mel stats from the preprocess code for vocoder input normalization.

Thank you

Wendison commented 2 years ago

Hi, based on my experience, using the same mel stats for vocoder and VC model leads to better voice quality, so for your questions: 1) I think that training a vocoder using the new mel stats could generate the speech with higher quality, or you can use my provided mel stats (from PWG vocoder trained by VCTK) to normalize mels for training the VC model. 2) the mels stats for vocoder input normalization is not from preprocess code, it is from the PWG repo for preprocessing mels.

winddori2002 commented 2 years ago

Thank you for answering! I understand and solve it.