PlayVoice / whisper-vits-svc

Core Engine of Singing Voice Conversion & Singing Voice Clone
https://huggingface.co/spaces/maxmax20160403/sovits5.0
MIT License
2.57k stars 914 forks source link

whisper and hubert #127

Open wblgers opened 9 months ago

wblgers commented 9 months ago

Hi,

After I read the code, I found whisper encoder out is used as PPG, and hubert is used as Vec. I'm curious that the hubert here is discrete hubert after kmeans or hubert soft or just hubert hidden layer out? And what's the advantage on the mix of PPG and Vec?

Thanks~

MaxMax2016 commented 9 months ago

Use whisper in order to pronounce each word clearly, and Use HuBERT soft to make up for pronunciation details.

wblgers commented 9 months ago

Use whisper in order to pronounce each word clearly, and Use HuBERT soft to make up for pronunciation details.

Do you train a Chinese version HuBERT soft? Is there any reference?

MaxMax2016 commented 9 months ago

https://github.com/fishaudio/chinese-hubert-soft

wblgers commented 9 months ago

https://github.com/fishaudio/chinese-hubert-soft

OK, thanks, I'll try to train a chinese huerbt soft using more data.

panxin801 commented 1 month ago

Thanks for the question, I'm wonder what's will happen if I remove whisper ppg as input for I made a fake whisper ppg (like all zeros) will happened, do you try something like this before ?