Closed thivux closed 2 months ago
if DVAE does make a difference. how many epochs did you fine-tune it for?
if DVAE does make a difference. how many epochs did you fine-tune it for?
Here are the hyperparameters that gave the best performance in my experience.
CUDA_VISIBLE_DEVICES=0 python train_dvae_xtts.py \
--output_path=checkpoints/ \
--train_csv_path=datasets/metadata_train.csv \
--eval_csv_path=datasets/metadata_eval.csv \
--language="vi" \
--num_epochs=5 \
--batch_size=512 \
--lr=5e-6
@thivux Xin chào bạn Mình cũng đang muốn training mô hình này cho giọng nói tiếng việt, mình đã làm theo hướng dẫn và hiện đang bị lỗi Nếu bạn đã train được mô hình này thành công có thể public cho mình tham khảo được không?
hi, i am finetuning this model on vietnamese dataset and face a problem with generating short sentences: the audio is unintelligible in some parts. after doing some research, i found a comment saying that fine-tuning the DVAE on my dataset will get rid of this problem. i am not sure if this will work, as the DVAE model is already trained on a lot of data and i expect it to be able to generalize well on new data. how is your experience with XTTS, with and without fine-tuning the DVAE component? does fine-tuning DVAE help with short sentences?