cloneofsimo / lora

Using Low-rank adaptation to quickly fine-tune diffusion models.
https://arxiv.org/abs/2106.09685
Apache License 2.0
6.94k stars 479 forks source link

So many parameters please give me some direction I want to prepare best tutorial for the general public #207

Open FurkanGozukara opened 1 year ago

FurkanGozukara commented 1 year ago

I am using Dreambooth extension of automatic1111 that is what people love. But there are simply just so many parameters to set and which leads very bad results

1st parameter : Unfreeze Model we are already doing unet and text encoder training along with token vectors. So what is this Unfreeze Model? in description it says : Unfreezes model layers and allows for potentially better training, but makes increased VRAM usage more likely.

I will ask learning rates for 8bit adam optimizer. if you know good ones for new Lion please also write them 2nd parameter : Lora UNET Learning Rate and Lora Text Encoder Learning Rate I really need some base learning rates for both teaching a subject and a style

3rd parameter: Freeze CLIP Normalization Layers if CLIP is frozen, that means only token vectors are trained like in textual inversion? am I correct? the description says that : Keep the normalization layers of CLIP frozen during training. Advanced usage, may increase model performance and editability.

4th parameter: Step Ratio of Text Encoder Training text encoder training trains both text encoder layers and token vectors together right? if this is 0 that means only unet is trained?

5th parameter: AdamW Weight Decay How to set this parameter? according to which rule? I read about it and it is supposed to normalize weight settings between epochs. but what should be value for this? and according what?

6th parameter: Lora UNET Rank and Lora Text Encoder Rank according to which rule we should set these ranks? each rank means a vector of like 768 length or else? what should be their ratio like unet rank is 4x of text encoder rank?

thank you so much for all answers. I am getting asked so many times how LoRA best works but there are just too many parameters and there aren't any begin to test hints

can you give me some optimal parameters for both training a subject and a style?

rishabhjain commented 1 year ago

You can have a look at some examples of good set of training params by looking at training scripts here.

https://github.com/cloneofsimo/lora/tree/master/training_scripts

felixsanz commented 1 year ago

I am using Dreambooth extension of automatic1111 that is what people love. But there are simply just so many parameters to set and which leads very bad results

1st parameter : Unfreeze Model we are already doing unet and text encoder training along with token vectors. So what is this Unfreeze Model? in description it says : Unfreezes model layers and allows for potentially better training, but makes increased VRAM usage more likely.

I will ask learning rates for 8bit adam optimizer. if you know good ones for new Lion please also write them 2nd parameter : Lora UNET Learning Rate and Lora Text Encoder Learning Rate I really need some base learning rates for both teaching a subject and a style

3rd parameter: Freeze CLIP Normalization Layers if CLIP is frozen, that means only token vectors are trained like in textual inversion? am I correct? the description says that : Keep the normalization layers of CLIP frozen during training. Advanced usage, may increase model performance and editability.

4th parameter: Step Ratio of Text Encoder Training text encoder training trains both text encoder layers and token vectors together right? if this is 0 that means only unet is trained?

5th parameter: AdamW Weight Decay How to set this parameter? according to which rule? I read about it and it is supposed to normalize weight settings between epochs. but what should be value for this? and according what?

6th parameter: Lora UNET Rank and Lora Text Encoder Rank according to which rule we should set these ranks? each rank means a vector of like 768 length or else? what should be their ratio like unet rank is 4x of text encoder rank?

thank you so much for all answers. I am getting asked so many times how LoRA best works but there are just too many parameters and there aren't any begin to test hints

can you give me some optimal parameters for both training a subject and a style?

I'm also curious on many of this parameters! Do you have already some of them more clear? Thanks

FurkanGozukara commented 1 year ago

I am using Dreambooth extension of automatic1111 that is what people love. But there are simply just so many parameters to set and which leads very bad results 1st parameter : Unfreeze Model we are already doing unet and text encoder training along with token vectors. So what is this Unfreeze Model? in description it says : Unfreezes model layers and allows for potentially better training, but makes increased VRAM usage more likely. I will ask learning rates for 8bit adam optimizer. if you know good ones for new Lion please also write them 2nd parameter : Lora UNET Learning Rate and Lora Text Encoder Learning Rate I really need some base learning rates for both teaching a subject and a style 3rd parameter: Freeze CLIP Normalization Layers if CLIP is frozen, that means only token vectors are trained like in textual inversion? am I correct? the description says that : Keep the normalization layers of CLIP frozen during training. Advanced usage, may increase model performance and editability. 4th parameter: Step Ratio of Text Encoder Training text encoder training trains both text encoder layers and token vectors together right? if this is 0 that means only unet is trained? 5th parameter: AdamW Weight Decay How to set this parameter? according to which rule? I read about it and it is supposed to normalize weight settings between epochs. but what should be value for this? and according what? 6th parameter: Lora UNET Rank and Lora Text Encoder Rank according to which rule we should set these ranks? each rank means a vector of like 768 length or else? what should be their ratio like unet rank is 4x of text encoder rank? thank you so much for all answers. I am getting asked so many times how LoRA best works but there are just too many parameters and there aren't any begin to test hints can you give me some optimal parameters for both training a subject and a style?

I'm also curious on many of this parameters! Do you have already some of them more clear? Thanks

I tested many data for dreambooth training but not for lora

here a video

20.) Automatic1111 Web UI - PC - Free Epic Web UI DreamBooth Update - New Best Settings - 10 Stable Diffusion Training Compared on RunPods image

felixsanz commented 1 year ago

I am using Dreambooth extension of automatic1111 that is what people love. But there are simply just so many parameters to set and which leads very bad results 1st parameter : Unfreeze Model we are already doing unet and text encoder training along with token vectors. So what is this Unfreeze Model? in description it says : Unfreezes model layers and allows for potentially better training, but makes increased VRAM usage more likely. I will ask learning rates for 8bit adam optimizer. if you know good ones for new Lion please also write them 2nd parameter : Lora UNET Learning Rate and Lora Text Encoder Learning Rate I really need some base learning rates for both teaching a subject and a style 3rd parameter: Freeze CLIP Normalization Layers if CLIP is frozen, that means only token vectors are trained like in textual inversion? am I correct? the description says that : Keep the normalization layers of CLIP frozen during training. Advanced usage, may increase model performance and editability. 4th parameter: Step Ratio of Text Encoder Training text encoder training trains both text encoder layers and token vectors together right? if this is 0 that means only unet is trained? 5th parameter: AdamW Weight Decay How to set this parameter? according to which rule? I read about it and it is supposed to normalize weight settings between epochs. but what should be value for this? and according what? 6th parameter: Lora UNET Rank and Lora Text Encoder Rank according to which rule we should set these ranks? each rank means a vector of like 768 length or else? what should be their ratio like unet rank is 4x of text encoder rank? thank you so much for all answers. I am getting asked so many times how LoRA best works but there are just too many parameters and there aren't any begin to test hints can you give me some optimal parameters for both training a subject and a style?

I'm also curious on many of this parameters! Do you have already some of them more clear? Thanks

I tested many data for dreambooth training but not for lora

here a video

20.) Automatic1111 Web UI - PC - Free Epic Web UI DreamBooth Update - New Best Settings - 10 Stable Diffusion Training Compared on RunPods image

Holy... $&%!. That's a lot of work. I've watched the whole video because it was so interesting, but i got kinda lost at some point. And at the end you should include a conclusions, like "this worth and this doesn't", because I can't judge your photos (don't know you in person) and that kind of stuff. In your opinion, what's worth to activate? test6 and test9? (When training a face, because I know training a style will benefit from things like Lion Optimizer). Big thanks for the content, i subscribed to the channel