KoeAI / LLVC

MIT License
372 stars 31 forks source link

Discriminator missing for model finetuning #12

Open slimsushi opened 5 months ago

slimsushi commented 5 months ago

Hi,

I plan to finetune the pretrained generator model on a new personalized voice in german and I also plan to add noisy audio recordings to make it more robust in bad recording situations. Basically like you mentioned in your paper with: "[...] our model could be fine-tuned on a dataset comprised of only a single input speaker converted to a target voice in order to create a personalized voice conversion model."

Now my Question is: Do I also need the pretrained discriminator model to finetune the generator model? In your provided KoeAI/llvc_models on Huggingface is only the pretrained generator safed and I cannot find the corresponding discriminator model. Also in you train.py script the training only continues from the provided checkpoint if both the generator model and discriminator model are given. Is it still possible to train with a new discriminator initialized from scratch? Or is it possible that you provide me with the pretrained generator and discriminator model?

Thanks for your help

ksadov commented 5 months ago

I've found that initializing the discriminator from scratch works actually works better than using a pretrained discriminator when it comes to fine-tuning a model. I thought that the provided train script allowed for this but apparently not-- I'll see if I can get around to fixing this by the end of the week, or feel free to open a PR if you get it working for yourself.

Yaodada12 commented 5 months ago

I've found that initializing the discriminator from scratch works actually works better than using a pretrained discriminator when it comes to fine-tuning a model. I thought that the provided train script allowed for this but apparently not-- I'll see if I can get around to fixing this by the end of the week, or feel free to open a PR if you get it working for yourself.

According to the paper, I loaded your G_5000.pth and used 3090 single card to train 39 epcoh on the train-360 data, but my loss is getting bigger and bigger. What is the reason? What is the normal level total loss(loss_disc+loss_gen_all) when training to 39 epochs?

slimsushi commented 5 months ago

According to the paper, I loaded your G_5000.pth and used 3090 single card to train 39 epcoh on the train-360 data, but my loss is getting bigger and bigger. What is the reason? What is the normal level total loss(loss_disc+loss_gen_all) when training to 39 epochs?

When you use the Checkpoint G_500000.pt provided by the author, then you use the already pretrained model on that already used dataset, Therefore it should overfit and the loss will increase, because it has already seen that data. When using the checkpoint try to use new unseen data to finetune it on your specific task.

Yaodada12 commented 5 months ago

According to the paper, I loaded your G_5000.pth and used 3090 single card to train 39 epcoh on the train-360 data, but my loss is getting bigger and bigger. What is the reason? What is the normal level total loss(loss_disc+loss_gen_all) when training to 39 epochs?

When you use the Checkpoint G_500000.pt provided by the author, then you use the already pretrained model on that already used dataset, Therefore it should overfit and the loss will increase, because it has already seen that data. When using the checkpoint try to use new unseen data to finetune it on your specific task.

Sorry for misleading you. I am using the train-360 dataset that has been converted by other speaker.