Open kunibald413 opened 1 month ago
Sure, I keep forgetting to provide samples / a demo page for them.
However, I do need to find some speakers that the model has not trained against at all, since most of the speakers I do have already do decently-ish against the base model.
I should have a pool of speakers I've culled from my dataset ages ago that I can probably make some LoRAs of. It shouldn't be too much of a pain to transcribe + process them tonight.
Quite a few things unforeseen problems.
For the meantime, I'll provide the sample outputs for the Cyberpunk 2077 LoRAs, as they demonstrate LoRAs the best despite being half-baked: Cyberpunk.zip
It would've probably been better for me just to play around in the web UI and pick some outputs that sound fine.
Tomorrow (or Friday) I'll see about:
vall_e.demo
and vall_e.train --eval
, where both will source input text transcriptions from the validation dataset, but input audio prompts from the training dataset, to give a better representation of real-world use.
vall_e.demo
will also handle comparing between un-LoRA'd output and LoRA'd output.Alright, much better workflow now. I don't feel like I'm nearly losing my sanity as much as yesterday with trying to get samples cobbled together. Although:
vall_e.train --eval
, the demo page doesn't batch.Samples for LoRAs I have should be available here: https://vall-e-demo.ecker.tech/loras.html
I'll see if I can get more speakers trained and sampled, although I don't really have any ideas on who that aren't already in the training dataset.
Thank you, as always appreciate your detailed documentation.
If possible could you share audio samples of the lora finetunes?
Thank you for you time!