bmaltais / kohya_ss

Apache License 2.0
9.49k stars 1.22k forks source link

It's not working and I don't know why #1675

Closed PupHops closed 8 months ago

PupHops commented 11 months ago

19:43:08-033420 INFO Version: v22.1.1

19:43:08-043421 INFO nVidia toolkit detected 19:43:13-391874 INFO Torch 2.0.1+cu118 19:43:13-689897 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700 19:43:13-693900 INFO Torch detected GPU: NVIDIA GeForce RTX 2070 SUPER VRAM 8192 Arch (7, 5) Cores 40 19:43:13-696899 INFO Verifying modules installation status from requirements_windows_torch2.txt... 19:43:13-700899 INFO Verifying modules installation status from requirements.txt... 19:43:20-297451 INFO headless: False 19:43:20-303452 INFO Load CSS...

19:44:13-856789 INFO Loading config... 19:44:16-812035 INFO Start training LoRA Standard ... 19:44:16-813037 INFO Checking for duplicate image filenames in training data directory... 19:44:16-816036 INFO Valid image folder names found in: E:/Stable Difusion/Обучалка/images 19:44:16-818036 INFO Folder 150_Metro2033Tunel: 8 images found 19:44:16-820035 INFO Folder 150_Metro2033Tunel: 1200 steps 19:44:16-822036 INFO Total steps: 1200 19:44:16-824038 INFO Train batch size: 1 19:44:16-826039 INFO Gradient accumulation steps: 1 19:44:16-827039 INFO Epoch: 1 19:44:16-830038 INFO Regulatization factor: 1 19:44:16-832036 INFO max_train_steps (1200 / 1 / 1 1 1) = 1200 19:44:16-834038 INFO stop_text_encoder_training = 0 19:44:16-840038 INFO lr_warmup_steps = 120 19:44:16-842037 INFO Saving training config to E:/Stable Difusion/Обучалка/Model\Metro2033_20231112-194416.json... 19:44:16-846039 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="E:\Stable Difusion\stable-diffusion-webui\models\Stable-diffusion\aZovyaRPGArtistTools_v3.safetensors" --train_data_dir="E:/Stable Difusion/Обучалка/images" --resolution="512,512" --output_dir="E:/Stable Difusion/Обучалка/Model" --logging_dir="E:/Stable Difusion/Обучалка/Log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=8 --output_name="Metro2033" --lr_scheduler_num_cycles="1" --no_half_vae --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="120" --train_batch_size="1" --max_train_steps="1200" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 prepare tokenizer Using DreamBooth method. prepare images. found directory E:\Stable Difusion\Обучалка\images\150_Metro2033Tunel contains 8 image files 1200 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "E:\Stable Difusion\Обучалка\images\150_Metro2033Tunel" image_count: 8 num_repeats: 150 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Metro2033Tunel caption_extension: .txt

[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 111.10it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算 されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (512, 512), count: 1200 mean ar error (without repeats): 0.0 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: E:\Stable Difusion\stable-diffusion-webui\models\Stable-diffusion\aZovyaRPGArtistTools_v3.safetensors UNet2DConditionModel: 64, 8, 768, False, False loading u-net: loading vae: loading text encoder: Enable xformers for U-Net Traceback (most recent call last): File "E:\Stable Difusion\kohya_ss\train_network.py", line 1009, in trainer.train(args) File "E:\Stable Difusion\kohya_ss\train_network.py", line 232, in train vae.set_use_memory_efficient_attention_xformers(args.xformers) File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 251, in set_use_memory_efficient_attention_xformers fn_recursive_set_mem_eff(module) File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 244, in fn_recursive_set_mem_eff module.set_use_memory_efficient_attention_xformers(valid, attention_op) File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 203, in set_use_memory_efficient_attention_xformers raise ValueError( ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\Stable Difusion\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\Stable Difusion\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=E:\Stable Difusion\stable-diffusion-webui\models\Stable-diffusion\aZovyaRPGArtistTools_v3.safetensors', '--train_data_dir=E:/Stable Difusion/Обучалка/images', '--resolution=512,512', '--output_dir=E:/Stable Difusion/Обучалка/Model', '--logging_dir=E:/Stable Difusion/Обучалка/Log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=Metro2033', '--lr_scheduler_num_cycles=1', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=120', '--train_batch_size=1', '--max_train_steps=1200', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

Ouinon0 commented 11 months ago

+1 same error

4090 / windows, latest driver

Got it with multiple scheduler (AdamW8bit/adafactor) on a lora training

Fresh install won't change anything

floridomeacci commented 11 months ago

same for me

Feeling-z commented 10 months ago

+1 same error,hepl!!tks!

ep150de commented 10 months ago

I ran into similar issue; previously was using RTX 4090 to train with Kohya without issue.

swapped out for intel arc 770 gpu to compare training performance and followed steps to reinstall kohya with --use-ipex command

Noticing that the ubuntu command line indicates properly using intel pytorch extensions but runs into issues with missing --xformers

In windows command line, I get this error after a fresh install, noticing that the pytorch version is for CUDA, and not intel ipex. it will pull the windows requirements file instead of the intel ipex requirements, so some packages are getting wrecked.

error: ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU

Here's my callback trace: CPU: 14900k RAM 32gb GPU: Intel ARC 770 frostbyte

Traceback (most recent call last): File "C:\Users\Demo\kohya_ss\sdxl_train_network.py", line 185, in trainer.train(args) File "C:\Users\Demo\kohya_ss\train_network.py", line 236, in train vae.set_use_memory_efficient_attention_xformers(args.xformers) File "C:\Users\Demo\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 251, in set_use_memory_efficient_attention_xformers fn_recursive_set_mem_eff(module) File "C:\Users\Demo\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "C:\Users\Demo\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "C:\Users\Demo\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "C:\Users\Demo\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 244, in fn_recursive_set_mem_eff module.set_use_memory_efficient_attention_xformers(valid, attention_op) File "C:\Users\Demo\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 203, in set_use_memory_efficient_attention_xformers raise ValueError( ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU Traceback (most recent call last): File "C:\Users\Demo\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Demo\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\Demo\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Users\Demo\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "C:\Users\Demo\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "C:\Users\Demo\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\Demo\kohya_ss\venv\Scripts\python.exe', './sdxl_train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--train_data_dir=C:/Users/Demo/Desktop/pat/Training/img', '--resolution=1024,1024', '--output_dir=C:/Users/Demo/Desktop/pat/Training/model', '--logging_dir=C:/Users/Demo/Desktop/pat/Training/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=0.0004', '--unet_lr=0.0004', '--network_dim=256', '--output_name=arcpatv1', '--lr_scheduler_num_cycles=4', '--no_half_vae', '--learning_rate=0.0004', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=1680', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=Adafactor', '--optimizer_args', 'scale_parameter=False', 'relative_step=False', 'warmup_init=False', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--save_state', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.