Closed MorganCZY closed 5 years ago
linda_johnson.pt
is trained using the open source dataset https://keithito.com/LJ-Speech-Dataset/
Yes, multi_speaker.pt
is trained using our internal multispeaker dataset, and is expected to generalize across new speakers.
@wezteoh I tested the multi_speaker.pt model to synthesize new speakers' sounds and it did well. Have your team explored how long at least the training dataset of each speaker should be and how many speakers at least are needed to achieve such an effect? btw, for your internal multispeaker, are they all English speakers?
all english speakers, and we used about 10 hours each.
Thx! Besides when i run this repo with LJSpeech, i find the training is too slow on a 2080Ti gpu, far from the reference time in the demo webpage.(shown in the picture) I stripped out the mel calculating module and fed the pre-processed mel directly into training process by dataloader, but it didn't accelerate the training. Could you give me some speeding-up advice?
I rewrote AudioDataset to directly feed processed audio data rather than processing wavs into audio data. The training speed is now close to the given one.
@MorganCZY so create a pull request
@MorganCZY can you share your code?
could you please explain what kind of datasets are used to get models "linda_johnson.pt" and "multi_speaker.pt"? Is "multi_speaker.pt" model corresponding to in the paper? And what's for "linda_johnson.pt" model?