bmaltais / kohya_ss

Apache License 2.0
9.64k stars 1.24k forks source link

unable to train sdxl lora in kohya anymore #2098

Closed rafstahelin closed 6 months ago

rafstahelin commented 8 months ago

Recently no longer able to train. Vram usage for simple lora bs1 with gradientcheckpointing has gone above 4090 capacity. Have optimisers on but I get either OOO message on Kohya v23 or 50 hours training for what was 3 hours.

I am trianing 20 images with res 1024x1280

What has changed that now we can no longer train a simple lora when it was so easy before??

message.txt

Here's my Json for training:

{ "adaptive_noise_scale": 0, "additional_parameters": "--max_grad_norm=0", "bucket_no_upscale": true, "bucket_reso_steps": 64, "cache_latents": true, "cache_latents_to_disk": true, "caption_dropout_every_n_epochs": 0.0, "caption_dropout_rate": 0, "caption_extension": ".txt", "clip_skip": "1", "color_aug": false, "enable_bucket": true, "epoch": 20, "flip_aug": false, "full_bf16": false, "full_fp16": false, "gpu_ids": "", "gradient_accumulation_steps": 1, "gradient_checkpointing": true, "keep_tokens": 1, "learning_rate": 0.0001, "learning_rate_te": 1e-05, "learning_rate_te1": 1e-05, "learning_rate_te2": 5e-05, "logging_dir": "/workspace/training_t1k0", "lr_scheduler": "cosine", "lr_scheduler_args": "", "lr_scheduler_num_cycles": "", "lr_scheduler_power": "", "lr_warmup": 5, "max_bucket_reso": 2048, "max_data_loader_n_workers": "0", "max_resolution": "1024,1280", "max_timestep": 1000, "max_token_length": "75", "max_train_epochs": "", "max_train_steps": "", "mem_eff_attn": false, "min_bucket_reso": 256, "min_snr_gamma": 0, "min_timestep": 0, "mixed_precision": "bf16", "model_list": "custom", "multi_gpu": false, "multires_noise_discount": 0.2, "multires_noise_iterations": 8, "no_token_padding": false, "noise_offset": 0.0357, "noise_offset_type": "Original", "num_cpu_threads_per_process": 4, "num_machines": 1, "num_processes": 1, "optimizer": "AdamW8bit", "optimizer_args": "", "output_dir": "/workspace/stable-diffusion-webui/models/Lora", "output_name": "t1k0_v01_ds20notrig_rep5_bs1", "persistent_data_loader_workers": false, "pretrained_model_name_or_path": "/workspace/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors", "prior_loss_weight": 1.0, "random_crop": false, "reg_data_dir": "/workspace/regs", "resume": "", "sample_every_n_epochs": 0, "sample_every_n_steps": 50, "sample_prompts": "a photo of t1k0 woman, looking at viewer, black eyes, on the street, jewelry, upper body,wearing tights and jacket, portrait, close up, fisheye --w 1024 --h 1280 --d -1 --l 7.5 --s 20\na photo of t1k0 woman, looking at viewer, curly hair, black eyes, on the street, jewelry, upper body,wearing tights and jacket, portrait, close up --w 1024 --h 1280 --d -1 --l 7.5 --s 20", "sample_sampler": "dpm_2", "save_every_n_epochs": 1, "save_every_n_steps": 0, "save_last_n_steps": 0, "save_last_n_steps_state": 0, "save_model_as": "safetensors", "save_precision": "bf16", "save_state": false, "scale_v_pred_loss_like_noise_pred": false, "sdxl": true, "seed": "12345", "shuffle_caption": false, "stop_text_encoder_training": 0, "train_batch_size": 1, "train_data_dir": "/workspace/training_t1k0/20notrig-5reps", "use_wandb": true, "v2": false, "v_parameterization": false, "v_pred_like_loss": 0, "vae": "", "vae_batch_size": 0, "weighted_captions": false, "xformers": "xformers"

bmaltais commented 8 months ago

Let see if we can zero into the source for this. What was the last version that worked? Let me know what the release was for it and then we can compare the requirements modul list between the new v23.0.x release and that one... I suspect one of the updated module might be consuming more VRAM... or perhaps an update to kohya's sd-scripts is now consuming more VRAM... or a combination of both...

rafstahelin commented 8 months ago

I believe 21.8.7 was still good

rafstahelin commented 8 months ago

Maybe bitsandbytes update issue for adamW8bit?

bmaltais commented 8 months ago

I believe 21.8.7 was still good

WOW, this is an old release... Yeah, sd-scripts changed sooooo much since then that the difference is most probably stemming from there. You mwill probably have to stick to that release.

rafstahelin commented 8 months ago

actually i am 22.6.0 locally and it works..

The problem is really with Ashley's Ultimate SD template on Runpod which uses v23.0.11

I will let him know. But be good to sort out this vram issue as your new gui is really cool

bmaltais commented 8 months ago

Does v22.6.2 work? If it does not then the issue is with the sd-scripts update... if v22.6.2 work but it no longer work in v23.0.x then it is related to the new requirements in v23...

Can you confirm v22.6.2 is working or failing from a VRAM consumption point of view?

rafstahelin commented 8 months ago

Hey Bernard, I am getting this error when i install 22.6.2

` Traceback (most recent call last): File "E:\kohya2262\kohya_ss\sdxl_train.py", line 792, in train(args) File "E:\kohya2262\kohya_ss\sdxltrain.py", line 354, in train , _, optimizer = train_util.get_optimizer(args, trainable_params=params_to_optimize) File "E:\kohya2262\kohya_ss\library\train_util.py", line 3618, in get_optimizer raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです") ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです Traceback (most recent call last): File "C:\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\kohya2262\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "E:\kohya2262\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\kohya2262\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "E:\kohya2262\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\kohya2262\kohya_ss\venv\Scripts\python.exe', './sdxl_train.py', '--max_grad_norm=0', '--bucket_no_upscale', '--bucket_reso_steps=64', '--cache_latents', '--cache_latents_to_disk', '--caption_extension=.txt', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--gradient_checkpointing', '--keep_tokens=1', '--learning_rate=0.0001', '--learning_rate_te1=1e-05', '--learning_rate_te2=1e-05', '--logging_dir=E:\studio Dropbox\studio\ai\data\subjects\TikoUnconsciousnees\3training\log', '--lr_scheduler=cosine', '--lr_scheduler_num_cycles=20', '--lr_warmup_steps=200', '--max_data_loader_n_workers=0', '--resolution=1024,1280', '--max_train_steps=4000', '--mixed_precision=bf16', '--noise_offset=0.0357', '--optimizer_type=AdamW8bit', '--output_dir=E:\studio Dropbox\studio\ai\libs\SD\2loras\1subjects\t1k0', '--output_name=t1k0_00_20notrig_rep5_bs1', '--pretrained_model_name_or_path=E:\studio Dropbox\studio\ai\libs\SD\1models\base\SDXL_base.safetensors', '--reg_data_dir=E:\studio Dropbox\studio\ai\data\subjects\_3regularisation images\regs_v1', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--save_precision=bf16', '--seed=12345', '--train_batch_size=1', '--train_data_dir=E:\studio Dropbox\studio\ai\data\subjects\TikoUnconsciousnees\2datasets\trainingfolders\training_t1k0 20\20notrig-5reps', '--log_with', 'wandb', '--wandb_api_key=9328358809ad058d08c0f5e53cfc7f91f3d661b4', '--xformers', '--sample_sampler=dpm_2', '--sample_prompts=E:\studio Dropbox\studio\ai\libs\SD\2loras\1subjects\t1k0\sample\prompt.txt', '--sample_every_n_steps=50']' returned non-zero exit status 1.

`

bmaltais commented 8 months ago

Have you run the setup.bat? Also, make sure you have installed the right version of CUDA (11.8) as specified at https://github.com/bmaltais/kohya_ss#windows-pre-requirements

You might also want to delete the whole venv folder before running setup.bat

rafstahelin commented 8 months ago

ok, i upgraded bitsandbytes to 0.43.0 and 22.6.2 is working

rafstahelin commented 8 months ago

Runpod Ultimate Template is where the problem lies for me as I need to train some models, and the template Kohya version is set to v23.0.11, which gives me error.

rafstahelin commented 8 months ago

Fixed Runpod training with downgrading bitsandbytes to 0.41.0 !!

bmaltais commented 8 months ago

Is bob 0.43.0 broken on Linux?

rafstahelin commented 8 months ago

from what i understand, when i changed bob to 0.41.0 trainings on 4090 adamw8bit started working again

On Fri, Mar 15, 2024 at 5:21 PM bmaltais @.***> wrote:

Is bob 0.43.0 broken on Linux?

— Reply to this email directly, view it on GitHub https://github.com/bmaltais/kohya_ss/issues/2098#issuecomment-2000008642, or unsubscribe https://github.com/notifications/unsubscribe-auth/APNXTFUBWUDFNUJIOC4MM23YYMNZ3AVCNFSM6AAAAABEV4ACUKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBQGAYDQNRUGI . You are receiving this because you authored the thread.Message ID: @.***>

-- thanks

best regards, raf

rafstahelin commented 8 months ago

seems like both on linux and windows, since the issues went away both on my runpod training and my desktop windows machine

bmaltais commented 8 months ago

This is interesting... bitsandbytes 0.43.0 work fine on my system and is a build with native windows support. I am surprised it would cause issues... let see if there are many more reports of this.

rafstahelin commented 7 months ago

This is interesting... bitsandbytes 0.43.0 work fine on my system and is a build with native windows support. I am surprised it would cause issues... let see if there are many more reports of this.

Hey, i now tried 22.0.12 with bitsandbytes at 0.43.0 and works now, but even with gradientcheckpointing and xformers selected I am getting 8 hours training for a 2 hour training on earlier 22.6.2 version with bab 0.41.0

image

So, we're getting closer. But still something off

rafstahelin commented 7 months ago

This is interesting... bitsandbytes 0.43.0 work fine on my system and is a build with native windows support. I am surprised it would cause issues... let see if there are many more reports of this.

Bernard seems to be fine now with 0.43.0 on latest build. Not sure what changed from 23.0.11 and 23.0.12 but training now works in normal speeds

bmaltais commented 7 months ago

That is great. Might have been VRAM swapping out to shared RAM that caused the slowdown.

sabinakaminska95 commented 7 months ago

I am working on multi GPU NVIDIA A100 80GB on 3 cards, my training takes 15h and increase, still something works wrong. I am working on 23.0.11 bitsandbytes 0.43.0 but also I use newest version and the same.

epoch 1/60 steps: 1%|█▏ | 197/13600 [12:38<14:20:29, 3.85s/it, avr_loss=0.115]

accelerate launch --num_cpu_threads_per_process=8 "sdxl_train_network.py" --cache_text_encoder_outputs --network_train_unet_only --bucket_reso_steps="32" --bucket_no_upscale --bucket_reso_steps=64 --cache_latents --cache_latents_to_disk --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --gradient_checkpointing --learning_rate="4e-06" --logging_dir="/log" --lr_scheduler="cosine" --lr_scheduler_num_cycles="20" --lr_warmup_steps="13600" --max_data_loader_n_workers="0" --max_grad_norm="1" --resolution="1024,1024" --max_train_steps="13600" --min_snr_gamma=5 --mixed_precision="fp16" --network_alpha="1" --network_dim=8 --network_module=networks.lora --no_half_vae --optimizer_type="AdamW8bit" --output_dir="/model" --output_name="xxx" --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" --save_every_n_epochs="5" --save_model_as=safetensors --save_precision="fp16" --text_encoder_lr=4e-06 --train_batch_size="5" --train_data_dir="/img" --unet_lr=4e-06 --xformers

bmaltais commented 7 months ago

The change is probable related to the version of BNB and/or kohya's code updates. Hard to really say since I don't have access to the same training hardware as you do.