bmaltais / kohya_ss

Apache License 2.0
9.54k stars 1.23k forks source link

AdamW 8Bit doesnt seem to work #485

Closed wwgam closed 8 months ago

wwgam commented 1 year ago

now after the update, AdamW seems to work at least(Which was previously not functioning too)

Please let me know if there's a fix to this I don't mind getting an Old working version too at this point, if there's an Old working commit that I can clone separately to use for the time being maybe? I just want to build a Lora model :L AdamW seems to work but 21 images at 100 Steps each takes around an hour easily even with 2 training steps, and no gradients, on a GTX 1070 with 8GB VRAM

Here's the bug from Adamw8bit anyways

====BUG REPORT==== CUDA SETUP: Loading binary C:\Ext Softwares\Stable diffusion\Kohya(Lora)\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll... use 8-bit AdamW optimizer | {} running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 2100 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 1050 num epochs / epoch数: 1 batch size per device / バッチサイズ: 2 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 1050 steps: 0%| | 0/1050 [00:00<?, ?it/s]epoch 1/1 Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu Traceback (most recent call last): File "C:\Users\wwgam\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\wwgam\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Ext Softwares\Stable diffusion\Kohya(Lora)\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Ext Softwares\Stable diffusion\Kohya(Lora)\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "C:\Ext Softwares\Stable diffusion\Kohya(Lora)\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "C:\Ext Softwares\Stable diffusion\Kohya(Lora)\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Ext Softwares\Stable diffusion\Kohya(Lora)\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--pretrained_model_name_or_path=C:/Ext Softwares/Stable diffusion/stable-diffusion-webui/models/Stable-diffusion/chilloutmix_NiPrunedFp32Fix.safetensors', '--train_data_dir=C:/Ext Softwares/Stable diffusion/Renders,Projects/Lora Models/Tam/Lora/Image', '--resolution=512,512', '--output_dir=C:/Ext Softwares/Stable diffusion/Renders,Projects/Lora Models/Tam/Lora/Model', '--logging_dir=C:/Ext Softwares/Stable diffusion/Renders,Projects/Lora Models/Tam/Lora/Log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=8', '--output_name=Tamannah', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=105', '--train_batch_size=2', '--max_train_steps=1050', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

bmaltais commented 1 year ago

Adam8bit is a bit difficult. It is not supported on all cards. I don't know which support it and which does not... but I have a feeling the 1070 might not work given the errors above. So if AdamW work then go with that. I personally don't use AdamW8bit anymore as the quality of the output is not as good as AdamW.

wwgam commented 1 year ago

I see! Thanks for that Insight, didn't know adamW was better in quality than that 8bit one Thanks for that info, guess ill use adam even tho it takes an hour :)

garyakimoto commented 1 year ago

yes, it have problem at kohya_ss version, I try Automatic1111 with Adam8bit, I can setup >12 Train batch size for my 3090 WITHOUT problem, but kohya_ss when I set Train batch size > 4 I will get error, hope this can fix soon.

bmaltais commented 1 year ago

yes, it have problem at kohya_ss version, I try Automatic1111 with Adam8bit, I can setup >12 Train batch size for my 3090 WITHOUT problem, but kohya_ss when I set Train batch size > 4 I will get error, hope this can fix soon.

You should report this as an issue to Kohya via his repo as he is the one that would need to fix this issue in his code.

garyakimoto commented 1 year ago

yes, it have problem at kohya_ss version, I try Automatic1111 with Adam8bit, I can setup >12 Train batch size for my 3090 WITHOUT problem, but kohya_ss when I set Train batch size > 4 I will get error, hope this can fix soon.

You should report this as an issue to Kohya via his repo as he is the one that would need to fix this issue in his code.

I find when I enable "Gradient checkpointing" it will let me start model training, but the update was slow...