RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
MIT License
36.45k stars 4.16k forks source link

[Follow-Up] Cross-Lingual Inference Loss After Fine-Tuning On A New Language #1655

Open justinjohn0306 opened 2 months ago

justinjohn0306 commented 2 months ago

@RVC-Boss

This is a follow-up ticket to the issue: #1626.

Steps to Reproduce:

  1. Fine-tune the base model using a 24-hour Spanish dataset.
  2. Align the fine-tuned model with the base.
  3. Try cross-lingual inference with other languages (e.g., English, Chinese).

Expected Behavior:

The model should retain cross-lingual abilities and be able to infer in languages other than Spanish (like English, Chinese, etc.).

Actual Behavior:

The model appears to have lost the ability to handle languages outside of the one it was fine-tuned on.

Question:

Is there a recommended way to preserve cross-lingual capabilities during fine-tuning? Should earlier layers be frozen during fine-tuning to avoid losing this ability?

RVC-Boss commented 2 months ago

You should fine tune the model using all languages you need to cross when cross-lingual inference.

Freeze earlier layers: It may work, but I haven't do the experiment, and therefore I cann't give you a clear conclusion.

justinjohn0306 commented 2 months ago

Gotcha, yeah that definitely makes sense!

Sheldonimo commented 1 month ago

@justinjohn0306 any update? I am trying to do the same.

AngelGuevara7 commented 1 month ago

Hey @justinjohn0306! I'm trying to do the same too. I have started with just the Spanish dataset to check if it learns new languages ​​and I already have the training up and running. I saw in other issues that you already have a Spanish model. Could you share the details of your training params to compare results? Configuration files are fine if you don't mind. Thanks in advance!!