NeuralNotW0rk / LoRAW

Flexible LoRA Implementation to use with stable-audio-tools
MIT License
39 stars 3 forks source link

EXAMPLE Lora training and EXAMPLE lora #7

Open Maelstrom2014 opened 3 months ago

Maelstrom2014 commented 3 months ago

HI! 1) Please put EXAMPLE Lora training with dataset and EXAMPLE lora using it. Thanx a lot. 2) Will it train only lora with train.py? 3) Whats the size of lora and how much epochs to train? 4) How much GPU VRAM it needs?

NeuralNotW0rk commented 3 months ago

Hi there,

Please put EXAMPLE Lora training with dataset and EXAMPLE lora using it. Thanx a lot.

I'm still experimenting a bunch with this myself, but I'll prioritize adding some examples. As for the dataset preparation, I would defer to the examples in stable-audio-tools for the time being. I'll add some links to those.

Will it train only lora with train.py?

train.py has all the same functionality as train.py in stable-audio-tools. Adding the --use-lora argument will instruct it to freeze the base model and train lora weights only.

What's the size of lora and how much epochs to train?

I'm still figuring out the best hyperparameters to use. With Stable Audio Open, a rank 16 lora is ~59 MB and a rank 128 lora is ~472 MB. With a batch size of 32 and sample length of 10s, I am getting good results in several-thousand steps range (a few hours on my setup)

How much GPU VRAM it needs?

I am fine-tuning Stable Audio Open with a batch size of 32 and a sample length of 10 seconds. On a 4070 ti super in Windows 11, it is taking a bit less that 10GB for rank 16 and 13GB for rank 128. It may be more efficient on linux -- I was getting less than 8GB while experimenting with rank 16 on Colab.

Maelstrom2014 commented 2 months ago

Any new about examples?

benbowler commented 2 months ago

The sticking point in the README is this line:

https://github.com/NeuralNotW0rk/LoRAW/blob/main/README.md?plain=1#L21

I have tried the configs here with the Stable Audio Open 1.0 checkpoint as the starting point with the command:

python ./train.py  --dataset-config ./datasets.json --model-config ./lorawfinetune-config.json --pretrained-ckpt-path ./stable-audio-open-1.0/model.ckpt --use-lora true

But it fails to run entirely in this repo. In the main repo the finetuned model returns noise when using laion_clap checkpoint.

NeuralNotW0rk commented 2 months ago

Any new about examples?

I think I can share some audio examples in my next commit. In your original question are you asking for an actual dataset and a lora checkpoint?

NeuralNotW0rk commented 2 months ago

The sticking point in the README is this line:

https://github.com/NeuralNotW0rk/LoRAW/blob/main/README.md?plain=1#L21

I have tried the configs here with the Stable Audio Open 1.0 checkpoint as the starting point with the command:

python ./train.py  --dataset-config ./datasets.json --model-config ./lorawfinetune-config.json --pretrained-ckpt-path ./stable-audio-open-1.0/model.ckpt --use-lora true

But it fails to run entirely in this repo. In the main repo the finetuned model returns noise when using laion_clap checkpoint.

The config in https://github.com/NeuralNotW0rk/LoRAW/blob/main/examples/model_config.json is what I use with Stable Audio Open. You should be able to add a "lora" section to the end of any model config. // ... args, model, training, etc. ... was meant as a placeholder to represent the rest of the model_config, but if it is confusing, I could just point the reader to the full example config instead.

benbowler commented 2 months ago

Ah yes, it wasn't originally clear which config was used to me to train stable audio open but I actually found the model_config.json file on hugging face independently just now!

GoombaProgrammer commented 1 month ago

What is the maximum length recommended for each wav in the data?

Maelstrom2014 commented 3 weeks ago

Where I can find trained loras for testing? thax all!