Choddeok / EmoSphere-TTS

The official implementation of EmoSphere-TTS
86 stars 8 forks source link

Missing pretrained Vocoder model (BigV_16k) #3

Closed G-Thor closed 2 weeks ago

G-Thor commented 2 months ago

I am attempting to train a model using your code.

In the base config file used in model training (e.g. when calling sh train_run.sh), a pretrained vocoder model seems to be required. https://github.com/Choddeok/EmoSphere-TTS/blob/88c237bc039e3d2e08476b9961bbf3fb94db89ec/egs/egs_bases/tts/base.yaml#L44-L45 However, the README doesn't mention this, nor can I seem to find any explanation about where this model should come from. I could not find any reference to a 16kHz pretrained model on the official BigVGAN repo. Could you shed some light on where this pretrained model may be found or how it was trained in the first place?

Thanks in advance!

Choddeok commented 1 month ago

Firstly, I apologize for not sharing the training process and checkpoint for bigV_16k. Currently, it is challenging to make the vocoder publicly available. Instead, you can download the pre-trained model checkpoint from HiFi-GAN and adjust the parameters accordingly, which should allow you to proceed easily.

We plan to either release the checkpoint for bigV_16k in the future or modify the code to support training with HiFi-GAN directly.

Thank you for your understanding.

G-Thor commented 1 month ago

Thanks for your reply! I'll try it out with HiFiGAN instead