Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
MIT License
7.42k stars 747 forks source link

Adding Extra language to the Checkpoint results in a loss for all the languages learned before #176

Closed yuossfalaa closed 2 months ago

yuossfalaa commented 2 months ago

i add Arabic to the model and all the metrics are going up ,and they all seems fine until i tested after training and now only Arabic is working and all the other languages are just gibberish, did any one encounter this issue before?

All i did to the checkpoint before the training is adding a new dim to the language embeddings in ar and nar models. `checkpoint = torch.load('checkpoints/vallex-checkpoint.pt')

checkpoint['model']['ar_language_embedding.word_embeddings.weight'] = torch.cat( [checkpoint['model']['ar_language_embedding.word_embeddings.weight'].to(device), torch.rand(1, 1024).to(device)]) checkpoint['model']['nar_language_embedding.word_embeddings.weight'] = torch.cat( [checkpoint['model']['nar_language_embedding.word_embeddings.weight'].to(device), torch.rand(1, 1024).to(device)])

torch.save(checkpoint, 'vallex-checkpoint.pt') ` And then training :

python3 -m bin.trainer --max-duration 40 --filter-min-duration 0.5 --filter-max-duration 14 --train-stage 0 --num-buckets 12 --save-every-n 10000 --valid-interval 20000 --model-name valle --decoder-dim 1024 --nhead 16 --num-decoder-layers 12 --prefix-mode 1 --base-lr 0.05 --warmup-steps 200 --average-period 0 --num-epochs 20 --start-epoch 1 --start-batch 0 --accumulate-grad-steps 4 --exp-dir ${exp_dir}

Plachtaa commented 2 months ago

now only Arabic is working and all the other languages are just gibberish If your dataset contains only Arabic data, this is expected

yuossfalaa commented 2 months ago

now only Arabic is working and all the other languages are just gibberish If your dataset contains only Arabic data, this is expected

But i started training from your checkpoint which already have English for example, i thought it needs only Arabic now to build on previous knowledge. should i make a combined dataset from both languages to train, or can i train for the first 5 epochs with Arabic and then 5 epochs with English and expect a good result?

yuossfalaa commented 2 months ago

Here's My new experiment, i trained a model on 1200 hours of Arabic for 5 epochs and then i combined this dataset with an English one "libriTTs" and trained for an extra epoch. Yet it seems that the Arabic is now lost, and the English didn't work at all. The weirdest thing in this experiment is that the accuracy was always going up, reaching 0.68 and 0.58 for AR and NAR.

Should I combine both datasets and train from scratch, and will this work? @Plachtaa Please Help me, I have very limited resources and can't risk falling again in this training.

Plachtaa commented 2 months ago

Here's My new experiment, i trained a model on 1200 hours of Arabic for 5 epochs and then i combined this dataset with an English one "libriTTs" and trained for an extra epoch. Yet it seems that the Arabic is now lost, and the English didn't work at all. The weirdest thing in this experiment is that the accuracy was always going up, reaching 0.68 and 0.58 for AR and NAR.

Should I combine both datasets and train from scratch, and will this work? @Plachtaa Please Help me, I have very limited resources and can't risk falling again in this training.

You must combine all datasets you wish to included and finish training in one run

yuossfalaa commented 2 months ago

Thank you very much