Open karkirowle opened 4 years ago
I assume you're training on a single speaker. Instead, train on all LibriTTS + your own data.
I have a multi-speaker Data set (male and female both). Total speaker = 21 The total size of data = 13.36 hours Total audios ( audio length is between 3 to 10 seconds) = 14453
Each speaker has recorded around 40 minutes of audio.
I would like to train multi-speaker data using this repo.
Any comments @rafaelvalle
Hi @rafaelvalle could you please answer to the questions?
@shwetagargade216 I would train with both if you are training with LibriTTS. You are going to train a multi-speaker setup anyway, so more data can only benefit in that case. If you have a different language, I would maybe try only your data. But these things you usually cannot know in advance. If you train with both, make sure the format is the same, i.e sampling frequency.
@karkirowle From scratch training might be time-consuming and cost-effective, would like to try transfer learning first using libritts dataset.
And I do have an English language dataset.
I've been trying to run some adaptation experiments with Mellotron, i.e try to use small amount of data (less than an hour) to shift the acoustics of an existing speaker towards a different speaker. I.e. even if there is not a large amount of data from a male/female singer, it should be possible to move the acoustics by retraining with a similar speaker's id.
My experiment's haven't been succesful so far, interestingly, I found that even the other speakers get affected during adaptation, and meaning becomes quickly uninterpretable.
Have you tried something like that? What layers should be ignored for adaptation?