OlaWod / FreeVC

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
MIT License
561 stars 102 forks source link

poor performance on seen-to-unseen task while finetuning on Hindi language #79

Open rgenai opened 1 year ago

rgenai commented 1 year ago

Hello! I'm delighted to come across this remarkable project, and thanks for sharing it as an open-source project. Currently, my focus lies on fine-tuning the freevc-s model using pretrained checkpoints as the foundation, specifically on a Hindi dataset. While I've achieved impressive results in seen-to-seen and unseen-to-seen tasks, with a remarkable 95% match, I'm eager to enhance the performance in the seen-to-unseen task. Presently, I'm encountering a moderate 60% match when working with the reference speaker for unseen-to-unseen and seen-to-unseen tasks. I would greatly appreciate any insights or suggestions you have to improve these results further.

EmreOzkose commented 11 months ago

Hi @MuruganR96 , how did you train with another language ? Did you train wavlm ?

emonigma commented 7 months ago

Hello! I'm delighted to come across this remarkable project, and thanks for sharing it as an open-source project. Currently, my focus lies on fine-tuning the freevc-s model using pretrained checkpoints as the foundation, specifically on a Hindi dataset. While I've achieved impressive results in seen-to-seen and unseen-to-seen tasks, with a remarkable 95% match, I'm eager to enhance the performance in the seen-to-unseen task. Presently, I'm encountering a moderate 60% match when working with the reference speaker for unseen-to-unseen and seen-to-unseen tasks. I would greatly appreciate any insights or suggestions you have to improve these results further.

Hi @MuruganR96 , I want to do what you did and fine-tune FreeVC on a non-English dataset. Your results of 95% match on seen-to-seen would be perfect for my use case. Can you please provide guidance or share your code?