anhnh2002 / XTTSv2-Finetuning-for-New-Languages

60 stars 17 forks source link

does finetuning the DVAE component make any difference? #9

Closed thivux closed 1 month ago

thivux commented 1 month ago

hi, i am finetuning this model on vietnamese dataset and face a problem with generating short sentences: the audio is unintelligible in some parts. after doing some research, i found a comment saying that fine-tuning the DVAE on my dataset will get rid of this problem. i am not sure if this will work, as the DVAE model is already trained on a lot of data and i expect it to be able to generalize well on new data. how is your experience with XTTS, with and without fine-tuning the DVAE component? does fine-tuning DVAE help with short sentences?

thivux commented 1 month ago

if DVAE does make a difference. how many epochs did you fine-tune it for?

anhnh2002 commented 1 month ago

if DVAE does make a difference. how many epochs did you fine-tune it for?

Here are the hyperparameters that gave the best performance in my experience.

CUDA_VISIBLE_DEVICES=0 python train_dvae_xtts.py \
--output_path=checkpoints/ \
--train_csv_path=datasets/metadata_train.csv \
--eval_csv_path=datasets/metadata_eval.csv \
--language="vi" \
--num_epochs=5 \
--batch_size=512 \
--lr=5e-6
vcstack commented 1 month ago

@thivux Xin chào bạn Mình cũng đang muốn training mô hình này cho giọng nói tiếng việt, mình đã làm theo hướng dẫn và hiện đang bị lỗi Nếu bạn đã train được mô hình này thành công có thể public cho mình tham khảo được không?