Closed PupHops closed 8 months ago
+1 same error
4090 / windows, latest driver
Got it with multiple scheduler (AdamW8bit/adafactor) on a lora training
Fresh install won't change anything
same for me
+1 same error,hepl!!tks!
I ran into similar issue; previously was using RTX 4090 to train with Kohya without issue.
swapped out for intel arc 770 gpu to compare training performance and followed steps to reinstall kohya with --use-ipex command
Noticing that the ubuntu command line indicates properly using intel pytorch extensions but runs into issues with missing --xformers
In windows command line, I get this error after a fresh install, noticing that the pytorch version is for CUDA, and not intel ipex. it will pull the windows requirements file instead of the intel ipex requirements, so some packages are getting wrecked.
Here's my callback trace: CPU: 14900k RAM 32gb GPU: Intel ARC 770 frostbyte
Traceback (most recent call last):
File "C:\Users\Demo\kohya_ss\sdxl_train_network.py", line 185, in
19:43:08-033420 INFO Version: v22.1.1
19:43:08-043421 INFO nVidia toolkit detected 19:43:13-391874 INFO Torch 2.0.1+cu118 19:43:13-689897 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700 19:43:13-693900 INFO Torch detected GPU: NVIDIA GeForce RTX 2070 SUPER VRAM 8192 Arch (7, 5) Cores 40 19:43:13-696899 INFO Verifying modules installation status from requirements_windows_torch2.txt... 19:43:13-700899 INFO Verifying modules installation status from requirements.txt... 19:43:20-297451 INFO headless: False 19:43:20-303452 INFO Load CSS...
19:44:13-856789 INFO Loading config... 19:44:16-812035 INFO Start training LoRA Standard ... 19:44:16-813037 INFO Checking for duplicate image filenames in training data directory... 19:44:16-816036 INFO Valid image folder names found in: E:/Stable Difusion/Обучалка/images 19:44:16-818036 INFO Folder 150_Metro2033Tunel: 8 images found 19:44:16-820035 INFO Folder 150_Metro2033Tunel: 1200 steps 19:44:16-822036 INFO Total steps: 1200 19:44:16-824038 INFO Train batch size: 1 19:44:16-826039 INFO Gradient accumulation steps: 1 19:44:16-827039 INFO Epoch: 1 19:44:16-830038 INFO Regulatization factor: 1 19:44:16-832036 INFO max_train_steps (1200 / 1 / 1 1 1) = 1200 19:44:16-834038 INFO stop_text_encoder_training = 0 19:44:16-840038 INFO lr_warmup_steps = 120 19:44:16-842037 INFO Saving training config to E:/Stable Difusion/Обучалка/Model\Metro2033_20231112-194416.json... 19:44:16-846039 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="E:\Stable Difusion\stable-diffusion-webui\models\Stable-diffusion\aZovyaRPGArtistTools_v3.safetensors" --train_data_dir="E:/Stable Difusion/Обучалка/images" --resolution="512,512" --output_dir="E:/Stable Difusion/Обучалка/Model" --logging_dir="E:/Stable Difusion/Обучалка/Log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=8 --output_name="Metro2033" --lr_scheduler_num_cycles="1" --no_half_vae --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="120" --train_batch_size="1" --max_train_steps="1200" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 prepare tokenizer Using DreamBooth method. prepare images. found directory E:\Stable Difusion\Обучалка\images\150_Metro2033Tunel contains 8 image files 1200 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True
[Subset 0 of Dataset 0] image_dir: "E:\Stable Difusion\Обучалка\images\150_Metro2033Tunel" image_count: 8 num_repeats: 150 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Metro2033Tunel caption_extension: .txt
[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 111.10it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算 されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (512, 512), count: 1200 mean ar error (without repeats): 0.0 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: E:\Stable Difusion\stable-diffusion-webui\models\Stable-diffusion\aZovyaRPGArtistTools_v3.safetensors UNet2DConditionModel: 64, 8, 768, False, False loading u-net:
loading vae:
loading text encoder:
Enable xformers for U-Net
Traceback (most recent call last):
File "E:\Stable Difusion\kohya_ss\train_network.py", line 1009, in
trainer.train(args)
File "E:\Stable Difusion\kohya_ss\train_network.py", line 232, in train
vae.set_use_memory_efficient_attention_xformers(args.xformers)
File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 251, in set_use_memory_efficient_attention_xformers
fn_recursive_set_mem_eff(module)
File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 247, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 244, in fn_recursive_set_mem_eff
module.set_use_memory_efficient_attention_xformers(valid, attention_op)
File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 203, in set_use_memory_efficient_attention_xformers
raise ValueError(
ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "E:\Stable Difusion\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in
File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
simple_launcher(args)
File "E:\Stable Difusion\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['E:\Stable Difusion\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=E:\Stable Difusion\stable-diffusion-webui\models\Stable-diffusion\aZovyaRPGArtistTools_v3.safetensors', '--train_data_dir=E:/Stable Difusion/Обучалка/images', '--resolution=512,512', '--output_dir=E:/Stable Difusion/Обучалка/Model', '--logging_dir=E:/Stable Difusion/Обучалка/Log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=Metro2033', '--lr_scheduler_num_cycles=1', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=120', '--train_batch_size=1', '--max_train_steps=1200', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.