KoeAI / LLVC

MIT License
380 stars 32 forks source link

Different Sample Rate (With retraining) #6

Open artificalaudio opened 1 year ago

artificalaudio commented 1 year ago

Hi,

Apologies if this is a silly question. If I train a model with a dataset made from a different Sample rate; will this technique still work? eg the training data would come from normal speech/singing @40khz, and time synced pairs of response from a 40khz RVC model.

Without changing anything internal to the LLVC model, can I use a different Sample Rate? (granted that I've made a dataset at 40khz for instance)

(Would changing the SR in config actually do anything to the model?)

I think the paper said 3 days on a decent GPU, I'm guessing training time would be more for higher sample rate.

Also I'm intrigued about the paper's mention of fine tuning to speaker identities. Whether it's always 3 days training, or once you have a base pretrained model, the fine tuning to custom voice is less time.

Thank you

tripathiarpan20 commented 11 months ago

I have the same question about finetuning from a base pretrained model to a custom voice!

LeXus5122 commented 9 months ago

interesting too

ksadov commented 9 months ago

The training procedure should work at different frequencies, given that source and target audio remain synced. The CPU voices for the official app are trained at 22.5kHz, for example. Training time is indeed longer for higher sample rates, and I can't guarantee that convergence for datasets over 22.5kHz.

As for finetuning to speaker identifies: in personal experiments I've found the procedure to be a little faster, 1-2 days. But I can't give you a hard number.