Q: AdamW LR? - Githubissues

kuzman123 commented 2 months ago

I tried with Prodigy optimizer, and it is exactly as you wrote - reaaallly slow convergence. I trained the model for 120 epochs and I could easily train another 60 epochs. I want to giva a try with AdamW, so I'm wondering which LR you suggest. btw. thanks a LOT for detailed explanation and the tools for this, nowadays it is very rare that someone posts such detailed instructions!

ThereforeGames commented 2 months ago

Hi @kuzman123,

Thank you for reaching out, and for the kind words. 🙂

I haven't tested B-LoRA with AdamW much, but the authors of the technique suggest a learning rate of 5e-5 for it.

It's possible that, due to the blocks we're targeting, B-LoRA will simply take longer to converge regardless of the optimizer.

That said, I have seen it mentioned that use_bias_correction=True may be responsible for the slowdown:

https://old.reddit.com/r/StableDiffusion/comments/18dvtpj/switching_from_kohyass_to_onetrainer_has_changed/kf71d2k/

Indeed, disabling this flag appears to speed up convergence by 3-4x... but it is not clear to me yet if doing so comes at a hit to quality. I think it might, but more testing is needed.

I hope that helps!

kuzman123 commented 2 months ago

It takes 150 epochs for full convergence. And it is like you said: it is HARD to overtrain it, it seems it's totally immune to overfitting. From now, i will tune only B-Lora, cause it's superior to all other latent diffusion models. btw. i was getting error when i tried with AdamW (optimiser arguments). Prodigy is ok, it just needs a bit more epochs. I've posted one of B-Loras on civitai. https://civitai.com/models/564049/ferenc-hodi-style Thnx again, you are my hero

ThereforeGames commented 2 months ago

Yeah, the near-immunity to overfitting makes the slower training time worth it in my book!

I've posted one of B-Loras on civitai.

This is the first style B-LoRA I've seen on Civitai. Very cool.

Have you tried the blora_content_layout_style.toml preset for comparison? It might do a better job of isolating style, although I've found it's not great for content.

Thnx again, you are my hero

I just put a few scripts together, the authors of B-LoRA are the real heroes here: Yarden Frenkel, Yael Vinker, Ariel Shamir and Daniel Cohen-Or. 🙂

kuzman123 commented 2 months ago

My very first B-Lora run was blora_content_layout_style.toml. But it was trained for 100 epoch, and with cosine it ended up undertrained. after that i trained content/style. I want to try 1 img style training with 128 dim/alpha.. if it's good, i will post it on civitai :-) I wonder how to "combine" content and style in ComfyUI, like in original inference: python inference.py --prompt="A made of gold"" --content_B_LoRA="<path/to/content_B-LoRA>" --output_path="<path/to/output_dir>" I will try with lora stacker and prompt attention..

ThereforeGames commented 2 months ago

I wonder how to "combine" content and style in ComfyUI, like in original inference

So, the original inference.py script performs LoRA block splitting at the time of inference. We can't do this in A1111 or Comfy without additional tools (for example, someone wrote a Comfy node to load B-LoRAs).

My blora_slicer.py performs the same operation, it just saves out the modified file for ease of use. Aside from compatibility problems, I think it's inefficient to split the blocks every time you want to use the LoRA. In my opinion, it's better to just stack them independently:

a photo of mycontent in the style of mystyle<lora:mycontent:1.0><lora:mystyle:1.0>

Which should yield identical results to B-LoRA's original approach.

kuzman123 commented 2 months ago

oo nice. i will run some tests today and post results here

kuzman123 commented 2 months ago

Check this out: https://civitai.com/models/568351 IMHO the best way to use B-Lora. If you want to play with it, make sure you download files from the link i've posted. ComfyUI workflows + all captions used for training. I hope you like it

kuzman123 commented 2 months ago

@ThereforeGames with rtx 3090 you can replace --mem_eff_attn with --xformers speeds up training x3, i can't see any difference in overall quality.

ThereforeGames commented 2 months ago

Interesting, thanks.

I have added a sdxl_blora_fast.bat preset with --xformers and use_bias_correction=False. I'll run some tests of my own to see how it impacts quality.

Let me know if you have any further suggestions for training settings - the original preset will remain available as sdxl_blora_classic.bat.

kuzman123 commented 2 months ago

What is the correlation between xformers and bias correction? I have very little theoretical knowledge of machine learning, while having the curiosity of a cat.

ThereforeGames commented 2 months ago

No correlation necessarily, it's just that bias correction also slows down training and may not provide big (if any?) benefits to quality.

kuzman123 commented 2 months ago

Let me know if you have any further suggestions for training settings - the original preset will remain available as sdxl_blora_classic.bat.

As for the parameters for Prodigy.... most of my loras are trained with slightly different parameters: weight_decay=0.01 d_coef=2 the difference i noticed is that my lores often overtrain, which is not the case with B-Lora and your parameters. I guess weight_decay=0.0001 affect that? I am asking this cause i know a lot of people struggles with Prodigy and its tendency to overtrain quickly, and although it is adaptive, very often better results are obtained with adamv and default LR..

MIDORIINKO commented 2 months ago

Hi, First of all, thank you for making this wonderful code.☺️

I usually use another trainer but often use prodigy. I often use this args." weight_decay=0.01 betas=(0.9,0.99) eps=1e-08 decouple=True use_bias_correction=True safeguard_warmup=False beta3=None d_coef=1" (From https://github.com/konstmish/prodigy)

I usually use true for use_bias_correction, but I will try false.🧐

ThereforeGames commented 1 month ago

Hey @kuzman123 and @MIDORIINKO,

Hope you're both doing well. I had a chance to test some of your recommended changes for sdxl_blora_fast.bat. Here are my findings:

Even with --use_bias_correction=True, the fast preset takes just 30% as long to train as the classic preset on my machine. That's amazing! We can chalk this up to --xformers and the removal of --mem_eff_attn.
I haven't noticed any quality degradation with the fast preset, although more testing is needed.
I haven't noticed any significant difference between betas=(0.9,0.99) and betas=(0.9,0.999).

I would like to collect more research on the following:

While --use_bias_correction=True is generally recommended, in practice I'm not sure it significantly impacts model quality (as corroborated here). I have observed a significant increase to training time with it enabled (at least when using --mem_eff_attn). I'd like to hear more arguments for or against this option...
Regarding @kuzman123's question about overtraining, I notice you're using d_coef=2 which effectively translates to a faster learning rate:

If you want to force the method to estimate a smaller or larger learning rate, it is better to change the value of d_coef (1.0 by default). Values of d_coef above 1, such as 2 or 10, will force a larger estimate of the learning rate; set it to 0.5 or even 0.1 if you want a smaller learning rate.

In Prodigy, a combination of high weight_decay and high d_coef can improve convergence times but will indeed create risk of overfitting. The current preset values are extremely resistant to overfitting, but are perhaps on the safe/slow side.

EDIT: For the record, I just tested weight_decay=0.01 and found that it produced more anatomical issues and inferior likeness when compared to the default value of 1e-04. I imagine using d_coef=2 would further exacerbate these problems.

ThereforeGames / blora_for_kohya

Q: AdamW LR? #2