bmaltais / kohya_ss

Apache License 2.0
9.72k stars 1.25k forks source link

Flux.1 LoRA training #2701

Open bmaltais opened 3 months ago

bmaltais commented 3 months ago

Kohya has added preliminary support for Flux.1 LoRA to his SD3 branch. I have created a sd3-flux.1 branch and updated to the latest sd-scripts sd3 branch code... No GUI integration yet... I will start adding the basic code to be able to establish that the model is Flux as part of the GUI.

bmaltais commented 3 months ago

The branch now contain MVP but for some reason the flux1 trainer crash with an Optimizer argument list is empty.

bmaltais commented 3 months ago

But I am not sure why I keep getting this error when trying to train:

FLUX: Gradient checkpointing enabled.
prepare optimizer, data loader etc.
                    INFO     use 8-bit AdamW optimizer | {}                                                                                                                                  train_util.py:4342
override steps. steps for 4 epochs is / 指定エポックまでのステップ数: 320
enable fp8 training.
Traceback (most recent call last):
  File "D:\kohya_ss\sd-scripts\flux_train_network.py", line 395, in <module>
    trainer.train(args)
  File "D:\kohya_ss\sd-scripts\train_network.py", line 543, in train
    if hasattr(t_enc.text_model, "embeddings"):
  File "D:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'T5EncoderModel' object has no attribute 'text_model'

Maybe I really need to upgrade to PyTorch to 2.4.0... not liking that as this might bork my non Flux.1 GUI... not feeling like upgrading...

BenDes21 commented 3 months ago

But I am not sure why I keep getting this error when trying to train:

FLUX: Gradient checkpointing enabled.
prepare optimizer, data loader etc.
                    INFO     use 8-bit AdamW optimizer | {}                                                                                                                                  train_util.py:4342
override steps. steps for 4 epochs is / 指定エポックまでのステップ数: 320
enable fp8 training.
Traceback (most recent call last):
  File "D:\kohya_ss\sd-scripts\flux_train_network.py", line 395, in <module>
    trainer.train(args)
  File "D:\kohya_ss\sd-scripts\train_network.py", line 543, in train
    if hasattr(t_enc.text_model, "embeddings"):
  File "D:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'T5EncoderModel' object has no attribute 'text_model'

Maybe I really need to upgrade to PyTorch to 2.4.0... not liking that as this might bork my non Flux.1 GUI... not feeling like upgrading...

Hi there, is it possible to only update PyTorch to 2.4.0 for only the Flux version of Kohya_ss GUI ?

protector131090 commented 3 months ago

SimpleTuner upadted to v0.9.8: quantised flux training in 40 gig.. 24 gig.. 16 gig... 13.9 gig.. Waiting so much for kohya

BenDes21 commented 3 months ago

SimpleTuner upadted to v0.9.8: quantised flux training in 40 gig.. 24 gig.. 16 gig... 13.9 gig.. Waiting so much for kohya

probably very soon

BenDes21 commented 3 months ago

But I am not sure why I keep getting this error when trying to train:

FLUX: Gradient checkpointing enabled.
prepare optimizer, data loader etc.
                    INFO     use 8-bit AdamW optimizer | {}                                                                                                                                  train_util.py:4342
override steps. steps for 4 epochs is / 指定エポックまでのステップ数: 320
enable fp8 training.
Traceback (most recent call last):
  File "D:\kohya_ss\sd-scripts\flux_train_network.py", line 395, in <module>
    trainer.train(args)
  File "D:\kohya_ss\sd-scripts\train_network.py", line 543, in train
    if hasattr(t_enc.text_model, "embeddings"):
  File "D:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'T5EncoderModel' object has no attribute 'text_model'

Maybe I really need to upgrade to PyTorch to 2.4.0... not liking that as this might bork my non Flux.1 GUI... not feeling like upgrading...

Hi there! Any news about the integration of Flux into the gui ?

bmaltais commented 3 months ago

I am running into similar but different errors. Waiting for the as-scripts code to stabilize to further work on it. I have a lot of the elements already in the gui. The missing ones can be added as extra parameters in the Advanced Accordion.

BenDes21 commented 3 months ago

I am running into similar but different errors. Waiting for the as-scripts code to stabilize to further work on it. I have a lot of the elements already in the gui. The missing ones can be added as extra parameters in the Advanced Accordion.

Nice! Should be released soon so, cannot wait to try! Thanks for your work

bmaltais commented 3 months ago

I updated to the latest sd-script commit for flux... still can't run training at my end unfortunately:

FLUX: Gradient checkpointing enabled.
prepare optimizer, data loader etc.
                    INFO     use 8-bit AdamW optimizer | {}                                                                                                                                  train_util.py:4346
override steps. steps for 4 epochs is / 指定エポックまでのステップ数: 320
enable fp8 training.
Traceback (most recent call last):
  File "D:\kohya_ss\sd-scripts\flux_train_network.py", line 397, in <module>
    trainer.train(args)
  File "D:\kohya_ss\sd-scripts\train_network.py", line 543, in train
    if hasattr(t_enc.text_model, "embeddings"):
  File "D:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'T5EncoderModel' object has no attribute 'text_model'

Here is a copy of my flux1_test.json config if you are interested to poke at it.

flux1_test.json

bmaltais commented 3 months ago

I pushed an update with support for the missing GUI parameters for Flux.1.

Here is the latest config for testing based on Kohya's readme config:

flux1_test.json

BenDes21 commented 3 months ago

I pushed an update with support for the missing GUI parameters for Flux.1.

Here is the latest config for testing based on Kohya's readme config:

flux1_test.json

thanks a lot

bmaltais commented 3 months ago

The GUI is a real mess with so many options. I get lost myself when trying to fing where is the option I need to set. Wish there was an easy solution… but I can’t think of one.

WarAnakin commented 3 months ago

The GUI is a real mess with so many options. I get lost myself when trying to fing where is the option I need to set. Wish there was an easy solution… but I can’t think of one.

bro, flux does not support training the text encoder, yet. Set your text encoder lr to 0 That should get you past the error.

WarAnakin commented 3 months ago

file_prefix132247230_00003_ file_prefix13132471339_00004_ file_prefix133824103849_00004_

Here's some images from a lora i trained

TripleHeadedMonkey commented 3 months ago

I'm surprised that wasn't more widely known actually. Stability AI, when they released SD3, mentioned that training the T5 model was not only not necessary but not recommended.

The same is likely true for FLux also. It simply relies on the tokenization from the Clip L and the Transformer model working in conjunction with the T5 model's established natural language processing.

And Clip L is almost entirely tag-based and seems highly unstable when trained anyway.

In other words as long as you create the embedding within the model itself, the T5's existing capabilities should be enough to hit the ground running and incorporate that embedding into natural language prompting off the bat.

How exactly this translates to the end result is something I am yet to see myself though.

protector131090 commented 3 months ago

ai toolkit skript works really great. Trained 3 LORAs so far (in 3 hours) its not perfect but super good. Awaiting for Kohya

bmaltais commented 3 months ago

file_prefix132247230_00003_ file_prefix13132471339_00004_ file_prefix133824103849_00004_

Here's some images from a lora i trained

Was it trained using kohya_ss? Great results.

jpXerxes commented 3 months ago

Is anyone getting past the AttributeError: 'T5EncoderModel' object has no attribute 'text_model'? Using the second version of flux1_test, which has lr set to 0 and that doesn't do it.

b-7777777 commented 3 months ago

Is anyone getting past the AttributeError: 'T5EncoderModel' object has no attribute 'text_model'? Using the second version of flux1_test, which has lr set to 0 and that doesn't do it.

Someone implemented a potential fix for it, but Kohya hasn't added it yet: https://github.com/kohya-ss/sd-scripts/issues/1453

FrakerKill commented 3 months ago

The GUI is a real mess with so many options. I get lost myself when trying to fing where is the option I need to set. Wish there was an easy solution… but I can’t think of one.

bro, flux does not support training the text encoder, yet. Set your text encoder lr to 0 That should get you past the error.

Have you an example json?

jpXerxes commented 3 months ago

Someone implemented a potential fix for it, but Kohya hasn't added it yet: kohya-ss/sd-scripts#1453

Koyha has added it now. Pulled these files: library/flux_train_utils.py flux_train_network.py train_network.py library/flux_models.py

but now I get: File "E:\kohya_ss\sd-scripts\flux_train_network.py", line 207, in sample_images accelerator, args, epoch, global_step, flux, ae, text_encoder, self.sample_prompts_te_outputs AttributeError: 'FluxNetworkTrainer' object has no attribute 'sample_prompts_te_outputs'

stepfunction83 commented 3 months ago

I can confirm that I am getting the same AttributeError as @jpXerxes after cloning the latest sd3 branch

Able to bypass the issue and begin training by adding --cache_text_encoder_outputs to the additional parameters!

jpXerxes commented 3 months ago

Able to bypass the issue and begin training by adding --cache_text_encoder_outputs to the additional parameters!

That did it. I ran the test file, with sample outputs every 1 epoch (4 epochs) and prompt: a painting of a steam punk skull with a gas mask , by darius kawasaki

These are the 4 sample images:

Flux 1-dev-test_e000001_00_20240815105853 Flux 1-dev-test_e000002_00_20240815110341 Flux 1-dev-test_e000003_00_20240815110827 Flux 1-dev-test_e000004_00_20240815111317

velmbi commented 3 months ago

How are you running the sd3_train,py script with kohya? I downloaded it but don't know what to do with it. I've always just used kohya normally but really want to try some flux training.

jpXerxes commented 3 months ago

How are you running the sd3_train,py script with kohya? I downloaded it but don't know what to do with it. I've always just used kohya normally but really want to try some flux training. Likely you could wait a very short time until bmaltais catches up, but: I'm no expert, but here's what I did: First, make sure you have the proper branch of bmaltais/kohya using git checkout sd3-flux.1

Go to https://github.com/kohya-ss/sd-scripts/tree/sd3 and download the 4 files below, and place them in the appropriate folders: library/flux_train_utils.py flux_train_network.py train_network.py library/flux_models.py

grab the second flux1_test.json posted above in this thread. Edit it to change all hard-coded paths to your own structure. Near the top is the line "additional_parameters": and add to it --cache_text_encoder_outputs

In the gui, add a choice for sample output frequency, and add a prompt

Please, anybody spot something wrong with this please correct me!

jpXerxes commented 3 months ago

A general problem with my test run is that Flux already knows how to deal with that prompt, so I get good images without the Lora. I will have to find something Flux knows nothing about to properly test.

stepfunction83 commented 3 months ago

You can also remove the sd scripts directory and replace it with the latest version of the sd3 branch.

On Thu, Aug 15, 2024, 3:09 PM jpXerxes @.***> wrote:

How are you running the sd3_train,py script with kohya? I downloaded it but don't know what to do with it. I've always just used kohya normally but really want to try some flux training. Likely you could wait a very short time until bmaltais catches up, but: I'm no expert, but here's what I did: First, make sure you have the proper branch of bmaltais/kohya using git checkout sd3-flux.1

Go to https://github.com/kohya-ss/sd-scripts/tree/sd3 and download the 4 files below, and place them in the appropriate folders: library/flux_train_utils.py flux_train_network.py train_network.py library/flux_models.py

grab the second flux1_test.json posted above in this thread. Edit it to change all hard-coded paths to your own structure. Near the top is the line "additional_parameters": and add to it --cache_text_encoder_outputs

In the gui, add a choice for sample output frequency, and add a prompt

Please, anybody spot something wrong with this please correct me!

— Reply to this email directly, view it on GitHub https://github.com/bmaltais/kohya_ss/issues/2701#issuecomment-2292017629, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH2WKOZ53CHAUV6BFZJXCATZRT4IDAVCNFSM6AAAAABMKAEAG2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJSGAYTONRSHE . You are receiving this because you commented.Message ID: @.***>

stepfunction83 commented 3 months ago

With a 24GB card, I run out of VRAM after about 30 or so training steps.

jpXerxes commented 3 months ago

With a 24GB card, I run out of VRAM after about 30 or so training steps. Hmm. Same 24GB here (4090) and it ran fine through 320 steps. ai-toolkit talks about using lowvram when you "only" have 24GB and some is being used for the display, but like I said it ran fine here.

FrakerKill commented 3 months ago

With a 24GB card, I run out of VRAM after about 30 or so training steps. Hmm. Same 24GB here (4090) and it ran fine through 320 steps. ai-toolkit talks about using lowvram when you "only" have 24GB and some is being used for the display, but like I said it ran fine here.

Haha, then me with 16GB VRAM (4060) good luck to run it

jpXerxes commented 3 months ago

Haha, then me with 16GB VRAM (4060) good luck to run it

Try changing the additional parameters from --highvram to --lowvram. don't know it will work, but can't hurt to try.

velmbi commented 3 months ago

How are you running the sd3_train,py script with kohya? I downloaded it but don't know what to do with it. I've always just used kohya normally but really want to try some flux training. Likely you could wait a very short time until bmaltais catches up, but: I'm no expert, but here's what I did: First, make sure you have the proper branch of bmaltais/kohya using git checkout sd3-flux.1

Go to https://github.com/kohya-ss/sd-scripts/tree/sd3 and download the 4 files below, and place them in the appropriate folders: library/flux_train_utils.py flux_train_network.py train_network.py library/flux_models.py

grab the second flux1_test.json posted above in this thread. Edit it to change all hard-coded paths to your own structure. Near the top is the line "additional_parameters": and add to it --cache_text_encoder_outputs

In the gui, add a choice for sample output frequency, and add a prompt

Please, anybody spot something wrong with this please correct me!

That config flux1_test.json doesn't load anything for me within the Kohya GUI under the LoRA tab.

I've edited the file so that it has all local paths set up.

But what are you supposed to actually do with the file?

FrakerKill commented 3 months ago

Haha, then me with 16GB VRAM (4060) good luck to run it

Try changing the additional parameters from --highvram to --lowvram. don't know it will work, but can't hurt to try.

No luck with the lowvram option: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.00 GiB. GPU 0 has a total capacty of 16.00 GiB of which 0 bytes is free. Of the allocated memory 41.91 GiB is allocated by PyTorch, and 3.96 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF steps: 0%| | 0/56 [09:19<?, ?it/s]

flux1_test_lowvram.json

I attach my json that I use into Kohya_ss GUI

jpXerxes commented 3 months ago

I realized that I'm actually using a modified version of the first "test" json - I'm getting so many diffrent test files that I can't keep them straight! Here's what I have that actually ran (I still can't say for sure it worked, as the samples probably ignored the lora) flux1_test_jpXerxes.json

stepfunction83 commented 3 months ago

Ah, I figured out the issue. Needed to change the resolution to 512,512 to align with the recommendation from SimpleTuner

It's now running at 15.2GB VRAM comfortably and at a similar training speed of 1.3it/s as SimpleTuner, so it should be viable (just barely) for 16GB cards.

Checking/Unchecking "highvram" didn't notably change the vram used.

Also, there is a checkbox for "cache_text_encoder_outputs." You can use that instead of putting it in the extra parameters section. That doesn't seem to work actually...

stepfunction83 commented 3 months ago

When generating samples, the default values seem to work well with euler_a, but are 512x512

Parameters which worked well for 1024x1024 were: --w 1024 --h 1024 --d 42 --l 4.0 --s 25

The important thing to note is that the guidance must be set to something roughly 2 or higher or the output will be garbage.

FrakerKill commented 3 months ago

I realized that I'm actually using a modified version of the first "test" json - I'm getting so many diffrent test files that I can't keep them straight! Here's what I have that actually ran (I still can't say for sure it worked, as the samples probably ignored the lora) flux1_test_jpXerxes.json

I can run it, but always OoM good cleanups between caching models but when go into training epochs, let's f... go taking VRAM and shared VRAM

stepfunction83 commented 3 months ago

I realized that I'm actually using a modified version of the first "test" json - I'm getting so many diffrent test files that I can't keep them straight! Here's what I have that actually ran (I still can't say for sure it worked, as the samples probably ignored the lora) flux1_test_jpXerxes.json

I can run it, but always OoM good cleanups between caching models but when go into training epochs, let's f... go taking VRAM and shared VRAM

Change Resolution to 512,512 from 1024,1024

FrakerKill commented 3 months ago

I realized that I'm actually using a modified version of the first "test" json - I'm getting so many diffrent test files that I can't keep them straight! Here's what I have that actually ran (I still can't say for sure it worked, as the samples probably ignored the lora) flux1_test_jpXerxes.json

I can run it, but always OoM good cleanups between caching models but when go into training epochs, let's f... go taking VRAM and shared VRAM

Change Resolution to 512,512 from 1024,1024

here?

imagen

I did it to but nothing changed, without sampling, resolution to 512 and max bucket to 1024. It seems chrome.

Wait with 512 and 1024 max bucket:

imagen

A present 😂 flux1_test_working.json

jpXerxes commented 3 months ago

I left mine with highvram, but changed to 512 from 1024, and it went from 3.6s/it to 1.1it/s

stepfunction83 commented 3 months ago

@FrakerKill Your training batch size is 5. Lower it to 1 and it should work.

velmbi commented 3 months ago

I'm using a little over half VRAM on a 3090 when training on 512x512, 1024x1024 OOME

stepfunction83 commented 3 months ago

Also, in general, using Euler instead of Euler_a as the sampler results in MUCH better samples.

bmaltais commented 3 months ago

cache_text_encoder_outputs

I have fixed the issue where the option was not properly served as a parameter. Finally training at 1.09s/it on my 3090

image

DarkViewAI commented 3 months ago

you will basically need quantization model i think for low vram

bmaltais commented 3 months ago

Latest config that successfully trained for me:

flux1_test-v2.json

jpXerxes commented 3 months ago

Latest config that successfully trained for me:

flux1_test-v2.json

Modified mine to make most of the changes you have, and it failed - please try with samples; I had to add the "additional_parameters": " --cache_text_encoder_outputs", to make it work with a sample prompt.

stepfunction83 commented 3 months ago

Latest config that successfully trained for me: flux1_test-v2.json

Modified mine to make most of the changes you have, and it failed - please try with samples; I had to add the "additional_parameters": " --cache_text_encoder_outputs", to make it work with a sample prompt.

He fixed the issue with the checkbox before running his config. I imagine if you pull the latest version of the gui, it should work.

jpXerxes commented 3 months ago

He fixed the issue with the checkbox before running his config. I imagine if you pull the latest version of the gui, it should work.

My bad. I didn't see that anything had changed except incorporating the latest sd-scripts, so I failed to pull it.

velmbi commented 3 months ago

Not sure what I did wrong but the 2000 step lora has no effect what so ever when used at a 1.0 weight in comfy.