Closed 17011775 closed 4 years ago
Hi, you were right. The training time that we mentioned in the paper is about the one-to-one VC, i.e. sampling two person A, B. If you copied all the speakers in VCTK to ./voice/trainA and ./voice/trainB, it will certainly increase the training time. You can consider increasing the batch size for training efficiency. If you want to research on many-to-one VC, just simply sample 1 speaker in VCTK to ./voice/trainB, and sample the rest to ./voice/trainA. If you want to research on many-to-many VC, I suggest you add a pre-trained speaker encoder module to the CVC decoder module. Feel free to pull it to this repo if it works. :)
Hi, thanks for implementing this in pytorch. In the paper, CVC's training time was 518 minutes(1000epoch). But when I ran the code, it took an hour per epoch.
I think the amount of dataset is the problem, because when I prepare dataset, I copied all of speakers in VCTK to ./voice/trainA and ./voice/trainB.
Is it the right way to use all speakers in VCTK in training? or just use sample two person A, B?
Thanks!