RVC-Project / Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!
MIT License
24.18k stars 3.58k forks source link

Question: Speaker IDs overwritten when fine-tuning? #648

Closed Rolun closed 1 year ago

Rolun commented 1 year ago

Hello,

If I understand correctly, RVC was trained on the VCTK using multi-speakers. However, I can't seem to access any of the original 109 speakers after fine-tuning on a new voice - all speaker ids just return that new voice now. Is the "emb_g" (speaker embedding) layer reset when fine-tuning or what is going on here?

Thanks in advance!

RVC-Boss commented 1 year ago

This is normal because you did not use the VCTK training set for training. If you want to use the voice of VCTK, you can use the pre trained model without fine-tuning it. The ids 1 to 108 corresponds to 0 to 107 in VCTK.

Rolun commented 1 year ago

@RVC-Boss Thanks! Does the .index file for the pre-trained model exist somewhere? I couldn't find it in the repo.

I'm also still curious as to why I can no longer access those speakers. I understand that I haven't trained on the VCTK training set, but shouldn't the weights from the pre-trained model (that has been trained on VCTK) still allow me to send in the speaker ids and get out the voices, even after fine-tuning? Or does it have something to do with the .index file only containing the one new voice I've fine-tuned on?