auspicious3000 / SpeechSplit

Unsupervised Speech Decomposition Via Triple Information Bottleneck
http://arxiv.org/abs/2004.11284
MIT License
636 stars 92 forks source link

How to get a generated speech from the output of the trained Generator? #55

Open 6lyx opened 2 years ago

6lyx commented 2 years ago

I have trained the Generator model with my own data. However, I found that there may not exist a code for generating the speech from the trained Generator. And I check the code named "demo.ipynb" for founding out the way. It indicates that a trained F0_Converter is needed. So I would like to ask the author that dose it nessusary to train a F0_Converter first for generating the speech from the trained Generator?(Because I found no code for training F0_Converter)? Or we just need to use the pretrained F0_Converter?

auspicious3000 commented 2 years ago

If your data is very different from vctk, you probably need to re-train the F0-converter

6lyx commented 2 years ago

Many many thanks for your quick answering. I am now using the speech with the sampling rate of 44100hz, does it mean that it is nesscuary to retrain the F0_Converter and the wavegen model? I have found that the speech I generated is much shorter than the original speech....... (Using the trained G model and the pretrained wavegen model obtained in this project).

auspicious3000 commented 2 years ago

Yes. In that case, you probably need to tweak other parts of the model as well.