PlayVoice / whisper-vits-svc

Core Engine of Singing Voice Conversion & Singing Voice Clone
https://huggingface.co/spaces/maxmax20160403/sovits5.0
MIT License
2.61k stars 920 forks source link

How to make use of GRL for speaker #19

Closed gtonkov closed 1 year ago

gtonkov commented 1 year ago

Hi,

How can I use the SpeakerClassifier in vits/modules_grl.py?

It was added with this commit but I cannot see this to be used anywhere during training.

Thanks!

MaxMax2016 commented 1 year ago

https://github.com/PlayVoice/so-vits-svc-5.0/blob/main/vits/modules_grl.py it will be used to finetune model, after base model trained. I still train base model, now.

this code will be added for the finetune

x = self.enc(x * x_mask, x_mask)

-- Speaker Classifier speaker_posterior = self.speaker_classifier(x) LOSS self.nll_loss = nn.NLLLoss() adv_loss = self.nll_loss(speaker_posteriors, speaker_targets[序号]):

but use speaker vector for loss

more info for GRL can be finded here:https://github.com/ubisoft/ubisoft-laforge-daft-exprt/blob/master/src/daft_exprt/model.py

MaxMax2016 commented 1 year ago

At the finetune stage, PosteriorEncoder should be frozen, if you want to try GRL now.

gtonkov commented 1 year ago

great, thanks :)