About Train time - Githubissues

Tinglok / CVC

CVC: Contrastive Learning for Non-parallel Voice Conversion (INTERSPEECH 2021, in PyTorch)

MIT License

57 stars 12 forks source link

Hi, you were right. The training time that we mentioned in the paper is about the one-to-one VC, i.e. sampling two person A, B. If you copied all the speakers in VCTK to ./voice/trainA and ./voice/trainB, it will certainly increase the training time. You can consider increasing the batch size for training efficiency. If you want to research on many-to-one VC, just simply sample 1 speaker in VCTK to ./voice/trainB, and sample the rest to ./voice/trainA. If you want to research on many-to-many VC, I suggest you add a pre-trained speaker encoder module to the CVC decoder module. Feel free to pull it to this repo if it works. :)

Tinglok / CVC

About Train time #2