Improving results for english

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

MIT License

35.82k stars 4.09k forks source link

Improving results for english #1312

Open KaikeWesleyReis opened 4 months ago

KaikeWesleyReis commented 4 months ago

Hi, First, thanks for this awesome model: two hours using your model solved a scenario of 8 months for me to generate one specific voice :)

I'm creating a personal chatbot (old desire) with the harbinger's voice (main villain of mass effect). Given the fact that I'm always generating the same voice, which is in english, I want to know:

1) Would be possible to use a BERT / HUBERT specialized in english instead of chinese version?
2) Do you have any tips to generate or select a good reference audio?

KamioRinn commented 4 months ago

1.Adjust DPO parameters 2.Slow down the speech of generating result by 10-20%

KaikeWesleyReis commented 4 months ago

1.Adjust DPO parameters 2.Slow down the speech of generating result by 10-20%

@KamioRinn can you explain more? I did not understand DPO parameters, what do you mean? (during the inference or training) And how can I slow down the speech in 10-20%?

SapphireLab commented 4 months ago

You can replace any BERT-like model as you like, such as mHuBERT147, it should be better to match the feature shape.
No good suggestions, try more.

can you explain more? I did not understand DPO parameters, what do you mean? (during the inference or training) And how can I slow down the speech in 10-20%?

The DPO option is selected by a checkbox when you fine-tune the GPT model part.

XXXXRT666 commented 4 months ago

adjust top-k, top-p and temperature

KaikeWesleyReis commented 4 months ago

Thanks! I'll try the first 1 and send a report. About the DPO, right now I'll not touch this.

KaikeWesleyReis commented 4 months ago

You can replace any BERT-like model as you like, such as mHuBERT147, it should be better to match the feature shape.

No good suggestions, try more.

can you explain more? I did not understand DPO parameters, what do you mean? (during the inference or training) And how can I slow down the speech in 10-20%?

The DPO option is selected by a checkbox when you fine-tune the GPT model part.

I tried mhubert-147 as you said, but the results was awful... If I change the hubert it's necessary to do the fine tuning again, right @SapphireLab ?

SapphireLab commented 4 months ago

I tried mhubert-147 as you said, but the results was awful... If I change the hubert it's necessary to do the fine tuning again, right

I think you had better fine-tune the model because the base model uses the different HuBERT model to extract features. And I only test the Chinese case with the mHuBERT-147 and it sounds not bad.