Open Maelstrom2014 opened 3 months ago
Hi there,
Please put EXAMPLE Lora training with dataset and EXAMPLE lora using it. Thanx a lot.
I'm still experimenting a bunch with this myself, but I'll prioritize adding some examples. As for the dataset preparation, I would defer to the examples in stable-audio-tools for the time being. I'll add some links to those.
Will it train only lora with train.py?
train.py has all the same functionality as train.py in stable-audio-tools. Adding the --use-lora argument will instruct it to freeze the base model and train lora weights only.
What's the size of lora and how much epochs to train?
I'm still figuring out the best hyperparameters to use. With Stable Audio Open, a rank 16 lora is ~59 MB and a rank 128 lora is ~472 MB. With a batch size of 32 and sample length of 10s, I am getting good results in several-thousand steps range (a few hours on my setup)
How much GPU VRAM it needs?
I am fine-tuning Stable Audio Open with a batch size of 32 and a sample length of 10 seconds. On a 4070 ti super in Windows 11, it is taking a bit less that 10GB for rank 16 and 13GB for rank 128. It may be more efficient on linux -- I was getting less than 8GB while experimenting with rank 16 on Colab.
Any new about examples?
The sticking point in the README is this line:
https://github.com/NeuralNotW0rk/LoRAW/blob/main/README.md?plain=1#L21
I have tried the configs here with the Stable Audio Open 1.0 checkpoint as the starting point with the command:
python ./train.py --dataset-config ./datasets.json --model-config ./lorawfinetune-config.json --pretrained-ckpt-path ./stable-audio-open-1.0/model.ckpt --use-lora true
But it fails to run entirely in this repo. In the main repo the finetuned model returns noise when using laion_clap
checkpoint.
Any new about examples?
I think I can share some audio examples in my next commit. In your original question are you asking for an actual dataset and a lora checkpoint?
The sticking point in the README is this line:
https://github.com/NeuralNotW0rk/LoRAW/blob/main/README.md?plain=1#L21
I have tried the configs here with the Stable Audio Open 1.0 checkpoint as the starting point with the command:
python ./train.py --dataset-config ./datasets.json --model-config ./lorawfinetune-config.json --pretrained-ckpt-path ./stable-audio-open-1.0/model.ckpt --use-lora true
But it fails to run entirely in this repo. In the main repo the finetuned model returns noise when using
laion_clap
checkpoint.
The config in https://github.com/NeuralNotW0rk/LoRAW/blob/main/examples/model_config.json is what I use with Stable Audio Open. You should be able to add a "lora" section to the end of any model config. // ... args, model, training, etc. ...
was meant as a placeholder to represent the rest of the model_config, but if it is confusing, I could just point the reader to the full example config instead.
Ah yes, it wasn't originally clear which config was used to me to train stable audio open but I actually found the model_config.json file on hugging face independently just now!
What is the maximum length recommended for each wav in the data?
Where I can find trained loras for testing? thax all!
HI! 1) Please put EXAMPLE Lora training with dataset and EXAMPLE lora using it. Thanx a lot. 2) Will it train only lora with train.py? 3) Whats the size of lora and how much epochs to train? 4) How much GPU VRAM it needs?