Tinglok / CVC

CVC: Contrastive Learning for Non-parallel Voice Conversion (INTERSPEECH 2021, in PyTorch)
MIT License
57 stars 12 forks source link

About Train time #2

Closed 17011775 closed 4 years ago

17011775 commented 4 years ago

Hi, thanks for implementing this in pytorch. In the paper, CVC's training time was 518 minutes(1000epoch). But when I ran the code, it took an hour per epoch.

I think the amount of dataset is the problem, because when I prepare dataset, I copied all of speakers in VCTK to ./voice/trainA and ./voice/trainB.

Is it the right way to use all speakers in VCTK in training? or just use sample two person A, B?

Thanks!

Tinglok commented 4 years ago

Hi, you were right. The training time that we mentioned in the paper is about the one-to-one VC, i.e. sampling two person A, B. If you copied all the speakers in VCTK to ./voice/trainA and ./voice/trainB, it will certainly increase the training time. You can consider increasing the batch size for training efficiency. If you want to research on many-to-one VC, just simply sample 1 speaker in VCTK to ./voice/trainB, and sample the rest to ./voice/trainA. If you want to research on many-to-many VC, I suggest you add a pre-trained speaker encoder module to the CVC decoder module. Feel free to pull it to this repo if it works. :)