Closed rafstahelin closed 6 months ago
Let see if we can zero into the source for this. What was the last version that worked? Let me know what the release was for it and then we can compare the requirements modul list between the new v23.0.x release and that one... I suspect one of the updated module might be consuming more VRAM... or perhaps an update to kohya's sd-scripts is now consuming more VRAM... or a combination of both...
I believe 21.8.7 was still good
Maybe bitsandbytes update issue for adamW8bit?
I believe 21.8.7 was still good
WOW, this is an old release... Yeah, sd-scripts changed sooooo much since then that the difference is most probably stemming from there. You mwill probably have to stick to that release.
actually i am 22.6.0 locally and it works..
The problem is really with Ashley's Ultimate SD template on Runpod which uses v23.0.11
I will let him know. But be good to sort out this vram issue as your new gui is really cool
Does v22.6.2 work? If it does not then the issue is with the sd-scripts update... if v22.6.2 work but it no longer work in v23.0.x then it is related to the new requirements in v23...
Can you confirm v22.6.2 is working or failing from a VRAM consumption point of view?
Hey Bernard, I am getting this error when i install 22.6.2
`
Traceback (most recent call last):
File "E:\kohya2262\kohya_ss\sdxl_train.py", line 792, in
`
Have you run the setup.bat? Also, make sure you have installed the right version of CUDA (11.8) as specified at https://github.com/bmaltais/kohya_ss#windows-pre-requirements
You might also want to delete the whole venv folder before running setup.bat
ok, i upgraded bitsandbytes to 0.43.0 and 22.6.2 is working
Runpod Ultimate Template is where the problem lies for me as I need to train some models, and the template Kohya version is set to v23.0.11, which gives me error.
Fixed Runpod training with downgrading bitsandbytes to 0.41.0 !!
Is bob 0.43.0 broken on Linux?
from what i understand, when i changed bob to 0.41.0 trainings on 4090 adamw8bit started working again
On Fri, Mar 15, 2024 at 5:21 PM bmaltais @.***> wrote:
Is bob 0.43.0 broken on Linux?
— Reply to this email directly, view it on GitHub https://github.com/bmaltais/kohya_ss/issues/2098#issuecomment-2000008642, or unsubscribe https://github.com/notifications/unsubscribe-auth/APNXTFUBWUDFNUJIOC4MM23YYMNZ3AVCNFSM6AAAAABEV4ACUKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBQGAYDQNRUGI . You are receiving this because you authored the thread.Message ID: @.***>
-- thanks
best regards, raf
seems like both on linux and windows, since the issues went away both on my runpod training and my desktop windows machine
This is interesting... bitsandbytes 0.43.0 work fine on my system and is a build with native windows support. I am surprised it would cause issues... let see if there are many more reports of this.
This is interesting... bitsandbytes 0.43.0 work fine on my system and is a build with native windows support. I am surprised it would cause issues... let see if there are many more reports of this.
Hey, i now tried 22.0.12 with bitsandbytes at 0.43.0 and works now, but even with gradientcheckpointing and xformers selected I am getting 8 hours training for a 2 hour training on earlier 22.6.2 version with bab 0.41.0
So, we're getting closer. But still something off
This is interesting... bitsandbytes 0.43.0 work fine on my system and is a build with native windows support. I am surprised it would cause issues... let see if there are many more reports of this.
Bernard seems to be fine now with 0.43.0 on latest build. Not sure what changed from 23.0.11 and 23.0.12 but training now works in normal speeds
That is great. Might have been VRAM swapping out to shared RAM that caused the slowdown.
I am working on multi GPU NVIDIA A100 80GB on 3 cards, my training takes 15h and increase, still something works wrong. I am working on 23.0.11 bitsandbytes 0.43.0 but also I use newest version and the same.
epoch 1/60 steps: 1%|█▏ | 197/13600 [12:38<14:20:29, 3.85s/it, avr_loss=0.115]
accelerate launch --num_cpu_threads_per_process=8 "sdxl_train_network.py" --cache_text_encoder_outputs --network_train_unet_only --bucket_reso_steps="32" --bucket_no_upscale --bucket_reso_steps=64 --cache_latents --cache_latents_to_disk --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --gradient_checkpointing --learning_rate="4e-06" --logging_dir="/log" --lr_scheduler="cosine" --lr_scheduler_num_cycles="20" --lr_warmup_steps="13600" --max_data_loader_n_workers="0" --max_grad_norm="1" --resolution="1024,1024" --max_train_steps="13600" --min_snr_gamma=5 --mixed_precision="fp16" --network_alpha="1" --network_dim=8 --network_module=networks.lora --no_half_vae --optimizer_type="AdamW8bit" --output_dir="/model" --output_name="xxx" --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" --save_every_n_epochs="5" --save_model_as=safetensors --save_precision="fp16" --text_encoder_lr=4e-06 --train_batch_size="5" --train_data_dir="/img" --unet_lr=4e-06 --xformers
The change is probable related to the version of BNB and/or kohya's code updates. Hard to really say since I don't have access to the same training hardware as you do.
Recently no longer able to train. Vram usage for simple lora bs1 with gradientcheckpointing has gone above 4090 capacity. Have optimisers on but I get either OOO message on Kohya v23 or 50 hours training for what was 3 hours.
I am trianing 20 images with res 1024x1280
What has changed that now we can no longer train a simple lora when it was so easy before??
message.txt
Here's my Json for training:
{ "adaptive_noise_scale": 0, "additional_parameters": "--max_grad_norm=0", "bucket_no_upscale": true, "bucket_reso_steps": 64, "cache_latents": true, "cache_latents_to_disk": true, "caption_dropout_every_n_epochs": 0.0, "caption_dropout_rate": 0, "caption_extension": ".txt", "clip_skip": "1", "color_aug": false, "enable_bucket": true, "epoch": 20, "flip_aug": false, "full_bf16": false, "full_fp16": false, "gpu_ids": "", "gradient_accumulation_steps": 1, "gradient_checkpointing": true, "keep_tokens": 1, "learning_rate": 0.0001, "learning_rate_te": 1e-05, "learning_rate_te1": 1e-05, "learning_rate_te2": 5e-05, "logging_dir": "/workspace/training_t1k0", "lr_scheduler": "cosine", "lr_scheduler_args": "", "lr_scheduler_num_cycles": "", "lr_scheduler_power": "", "lr_warmup": 5, "max_bucket_reso": 2048, "max_data_loader_n_workers": "0", "max_resolution": "1024,1280", "max_timestep": 1000, "max_token_length": "75", "max_train_epochs": "", "max_train_steps": "", "mem_eff_attn": false, "min_bucket_reso": 256, "min_snr_gamma": 0, "min_timestep": 0, "mixed_precision": "bf16", "model_list": "custom", "multi_gpu": false, "multires_noise_discount": 0.2, "multires_noise_iterations": 8, "no_token_padding": false, "noise_offset": 0.0357, "noise_offset_type": "Original", "num_cpu_threads_per_process": 4, "num_machines": 1, "num_processes": 1, "optimizer": "AdamW8bit", "optimizer_args": "", "output_dir": "/workspace/stable-diffusion-webui/models/Lora", "output_name": "t1k0_v01_ds20notrig_rep5_bs1", "persistent_data_loader_workers": false, "pretrained_model_name_or_path": "/workspace/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors", "prior_loss_weight": 1.0, "random_crop": false, "reg_data_dir": "/workspace/regs", "resume": "", "sample_every_n_epochs": 0, "sample_every_n_steps": 50, "sample_prompts": "a photo of t1k0 woman, looking at viewer, black eyes, on the street, jewelry, upper body,wearing tights and jacket, portrait, close up, fisheye --w 1024 --h 1280 --d -1 --l 7.5 --s 20\na photo of t1k0 woman, looking at viewer, curly hair, black eyes, on the street, jewelry, upper body,wearing tights and jacket, portrait, close up --w 1024 --h 1280 --d -1 --l 7.5 --s 20", "sample_sampler": "dpm_2", "save_every_n_epochs": 1, "save_every_n_steps": 0, "save_last_n_steps": 0, "save_last_n_steps_state": 0, "save_model_as": "safetensors", "save_precision": "bf16", "save_state": false, "scale_v_pred_loss_like_noise_pred": false, "sdxl": true, "seed": "12345", "shuffle_caption": false, "stop_text_encoder_training": 0, "train_batch_size": 1, "train_data_dir": "/workspace/training_t1k0/20notrig-5reps", "use_wandb": true, "v2": false, "v_parameterization": false, "v_pred_like_loss": 0, "vae": "", "vae_batch_size": 0, "weighted_captions": false, "xformers": "xformers"