coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
33.56k stars 4.08k forks source link

Retraining vocoder (hifigan) and dvae in XTTS #3323

Closed cnlinxi closed 6 months ago

cnlinxi commented 9 months ago

🚀 Feature Description

Thank you for your contributing!

What is the way to retrain HiFiGAN and DVAE in XTTS v2.0? If I want to use a new training set, but these audio only is in 16kHz sampling rate, do these two parts need to be retrained?

OnceJune commented 9 months ago

🚀 Feature Description Thank you for your contributing! What is the way to retrain HiFiGAN and DVAE in XTTS v2.0? If I want to use a new training set, but these audio only is in 16kHz sampling rate, do these two parts need to be retrained?

I have complished the hifi-gan and dvae train code.

Cool, any plan to open source?

ericwudayi commented 9 months ago

Hi @dy2009 , This is a very cool and useful feature! Hoping can open source it also!

I found the input feature of hifidecoder is gpt latent space, but not code indices itself. I am wondering how to re-train the hifigan model.

zhangyue678 commented 9 months ago

I want to use a new training set, and these audio only is in 16kHz sampling rate, do these two parts need to be retrained?

Liujingxiu23 commented 8 months ago

Hi @dy2009 , This is a very cool and useful feature! Hoping can open source it also!

I found the input feature of hifidecoder is gpt latent space, but not code indices itself. I am wondering how to re-train the hifigan model.

Have you know hwo to train of finetune the hifidecoder?

Yaodada12 commented 8 months ago

大佬们有训练出来吗,我准备微调speaker_encoder,有啥思路吗

stale[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Liujingxiu23 commented 6 months ago

Does any one retrain the model successfullly? How to do this?

Liujingxiu23 commented 6 months ago

when you finetune the GPT model of XTTS v2, how about the loss? loss_ce is about ?

tuanh123789 commented 4 months ago

I think I have completed the hifidecoder training part. The results were quite good when testing with the Ljspeech dataset. I will push it to github soon, stay tuned

tuanh123789 commented 4 months ago

The code available here: https://github.com/tuanh123789/Train_Hifigan_XTTS