PlayVoice / whisper-vits-svc

Core Engine of Singing Voice Conversion & Singing Voice Clone
https://huggingface.co/spaces/maxmax20160403/sovits5.0
MIT License
2.57k stars 914 forks source link

Suggestions needed to improve speaker similarity #100

Closed MuruganR96 closed 11 months ago

MuruganR96 commented 11 months ago

Hi @MaxMax2016 thank you so much for this wonderful project

I need your guidance on improving speaker similarity on So-VITS-SVC 4.1 stable branch.

I saw you introduced many approaches to reduce timbre to improve speaker similarity.

  1. GRL for speaker
  2. PPG Perturbation

Which one is better to improve similarity?

Please help me @MaxMax2016

MaxMax2016 commented 11 months ago

What works in this project is PPG Perturbation. Here, GRL model is too small to be useful.

MuruganR96 commented 11 months ago

Thank you @MaxMax2016

One follow up question

Why you did MIX encoder with hubert SSL + whisper ppg? Along whisper is not good?

What is your suggestions about Whisper PPG for cross-lingual voice conversion?

MaxMax2016 commented 11 months ago

Whisper PPG is not so good for cross-lingual voice conversion, so MIX encoder with hubert SSL + whisper ppg is used.

MuruganR96 commented 11 months ago

@MaxMax2016 I want to introduce GRL speaker classification loss in training.

Do I need to introduce it at the initial training phase or some interval after ( like 100k)?

What is your suggestion? please guide me @MaxMax2016 :)