Open NigaKniga opened 7 months ago
The error log you've provided indicates a CUDA out of memory error during the execution of a training script for a model using PyTorch with CUDA for GPU acceleration. This error occurs when the training process tries to allocate more memory on the GPU than is available. Specifically, the error message states:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB. GPU 0 has a total capacity of 11.00 GiB of which 0 bytes is free. Of the allocated memory 16.98 GiB is allocated by PyTorch, and 696.75 MiB is reserved by PyTorch but unallocated.
This error is a common issue in deep learning tasks, especially when working with large models or large batches of data.
04:42:47-667199 INFO Kohya_ss GUI version: v23.0.15 04:42:47-674189 ERROR [WinError 2] The system cannot find the file specified 04:42:47-677169 INFO nVidia toolkit detected 04:42:49-121313 INFO Torch 2.1.2+cu118 04:42:49-135252 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700 04:42:49-137275 INFO Torch detected GPU: NVIDIA GeForce GTX 1080 Ti VRAM 11264 Arch (6, 1) Cores 28 04:42:49-141264 INFO Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] 04:42:49-144228 INFO Verifying modules installation status from requirements_windows_torch2.txt... 04:42:49-148237 INFO Verifying modules installation status from requirements.txt... 04:42:51-364290 INFO headless: False Running on local URL: http://127.0.0.1:7860
To create a public link, set
share=True
inlaunch()
. 04:43:07-021434 INFO Loading config... 04:44:04-488717 INFO Start training Dreambooth... 04:44:04-489695 INFO Validating model file or folder path C:/Users/user/Desktop/Tools/SD auto forge/webui/models/Stable-diffusion/ponyDiffusionV6XL_v6StartWithThisOne.safetensors existence... 04:44:04-491690 INFO ...valid 04:44:04-492687 INFO Validating output_dir path /workspace/kohya_ss/output/SDXL1.0-LoRa_Zeitgeist-Photographic-Style_by-AI_Characters-v2.0 existence... 04:44:04-493685 INFO ...valid 04:44:04-494698 INFO Validating train_data_dir path C:\Users\user\Desktop\LORAimg\Image existence... 04:44:04-495679 INFO ...valid 04:44:04-496677 INFO reg_data_dir not specified, skipping validation 04:44:04-497674 INFO Validating logging_dir path /workspace/kohya_ss/output/SDXL1.0-LoRa_Zeitgeist-Photographic-Style_by-AI_Characters-v2.0 existence... 04:44:04-498671 INFO ...valid 04:44:04-499670 INFO log_tracker_config not specified, skipping validation 04:44:04-500666 INFO resume not specified, skipping validation 04:44:04-501663 INFO vae not specified, skipping validation 04:44:04-502661 INFO dataset_config not specified, skipping validation 04:44:04-505652 INFO Folder 100_subject: steps 132800 04:44:04-506650 INFO max_train_steps (132800 / 3 / 1 50 1) = 2213334 04:44:04-507648 INFO stop_text_encoder_training = 0 04:44:04-508645 INFO lr_warmup_steps = 0 04:44:04-509642 INFO Saving training config to /workspace/kohya_ss/output/SDXL1.0-LoRa_Zeitgeist-Photographic-Style_by-AI_Characters-v2.0\Mint yRyik:Gasai_Yuno_20240407-044404.json... 04:44:04-511638 INFO accelerate launch --num_cpu_threads_per_process=2 "C:\Users\user\kohya_ss/sd-scripts/sdxl_train.py" --bucket_reso_steps=64 --cache_latents --cache_latents_to_disk --caption_dropout_rate="0.05" --caption_extension=".txt" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --gradient_checkpointing --learning_rate="3e-05" --learning_rate_te1="1e-05" --learning_rate_te2="1e-05" --logging_dir="/workspace/kohya_ss/output/SDXL1.0-LoRa_Zeitgeist-Photographic-Style_by-AI_Chara cters-v2.0" --lr_scheduler="constant" --lr_scheduler_num_cycles="50" --max_data_loader_n_workers="0" --resolution="1024,1024" --max_train_epochs=50 --max_train_steps="2213334" --min_snr_gamma=5 --mixed_precision="fp16" --optimizer_type="AdamW" --output_dir="/workspace/kohya_ss/output/SDXL1.0-LoRa_Zeitgeist-Photographic-Style_by-AI_Charac ters-v2.0" --output_name="LORAtest" --pretrained_model_name_or_path="C:/Users/user/Desktop/Tools/SD auto forge/webui/models/Stable-diffusion/ponyDiffusionV6XL_v6StartWithThisOne.safetensors" --save_every_n_epochs="1" --save_model_as=safetensors --save_precision="fp16" --train_batch_size="3" --train_data_dir="C:\Users\user\Desktop\LORAimg\Image" --xformers The following values were not passed toaccelerate launch
and had defaults used instead:--num_processes
was set to a value of1
--num_machines
was set to a value of1
--mixed_precision
was set to a value of'no'
--dynamo_backend
was set to a value of'no'
To avoid this warning pass in values for each of the problematic parameters or runaccelerate config
. A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' 2024-04-07 04:44:12 INFO prepare tokenizers sdxl_train_util.py:135 INFO Using DreamBooth method. sdxl_train.py:140 2024-04-07 04:44:13 INFO prepare images. train_util.py:1469 INFO found directory C:\Users\user\Desktop\LORAimg\Image\100_subject train_util.py:1432 contains 1328 image files INFO 132800 train images with repeating. train_util.py:1508 INFO 0 reg images. train_util.py:1511 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1516 INFO [Dataset 0] config_util.py:544 batch_size: 3 resolution: (1024, 1024) enable_bucket: True network_multiplier: 1.0 min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: False100%|███████████████████████████████████████████████████████████████████████████| 1328/1328 [00:00<00:00, 10823.84it/s] INFO make buckets train_util.py:800 INFO number of images (including repeats) / train_util.py:846 各bucketの画像枚数(繰り返し回数を含む) INFO bucket 0: resolution (896, 1152), count: 100 train_util.py:851 INFO bucket 1: resolution (1024, 1024), count: 200 train_util.py:851 INFO bucket 2: resolution (1088, 960), count: 500 train_util.py:851 INFO bucket 3: resolution (1152, 896), count: 100 train_util.py:851 INFO bucket 4: resolution (1344, 768), count: 131900 train_util.py:851 INFO mean ar error (without repeats): 0.02777957066360671 train_util.py:856 INFO prepare accelerator sdxl_train.py:197 accelerator device: cuda INFO loading model for process 0/1 sdxl_train_util.py:31 INFO load StableDiffusion checkpoint: C:/Users/user/Desktop/Tools/SD auto sdxl_train_util.py:71 forge/webui/models/Stable-diffusion/ponyDiffusionV6XL_v6StartWithThis One.safetensors INFO building U-Net sdxl_model_util.py:192 INFO loading U-Net from checkpoint sdxl_model_util.py:196 2024-04-07 04:44:24 INFO U-Net: sdxl_model_util.py:202
2024-04-07 04:44:25 INFO building text encoders sdxl_model_util.py:205
2024-04-07 04:44:27 INFO loading text encoders from checkpoint sdxl_model_util.py:258
2024-04-07 04:44:28 INFO text encoder 1: sdxl_model_util.py:272
2024-04-07 04:44:32 INFO text encoder 2: sdxl_model_util.py:276
INFO building VAE sdxl_model_util.py:279
INFO loading VAE from checkpoint sdxl_model_util.py:284
2024-04-07 04:44:33 INFO VAE: sdxl_model_util.py:287
Disable Diffusers' xformers
INFO Enable xformers for U-Net train_util.py:2529
2024-04-07 04:44:34 INFO [Dataset 0] train_util.py:1948
INFO caching latents. train_util.py:915
INFO checking cache validity... train_util.py:925
100%|████████████████████████████████████████████████████████████████████████████| 1328/1328 [00:00<00:00, 4393.60it/s]
INFO caching latents... train_util.py:962
100%|██████████████████████████████████████████████████████████████████████████████| 1328/1328 [20:02<00:00, 1.10it/s]
train unet: True, text_encoder1: False, text_encoder2: False
number of models: 1
number of trainable parameters: 2567463684
prepare optimizer, data loader etc.
2024-04-07 05:04:37 INFO use AdamW optimizer | {} train_util.py:3819
override steps. steps for 50 epochs is / 指定エポックまでのステップ数: 2213450
running training / 学習開始
num examples / サンプル数: 132800
num batches per epoch / 1epochのバッチ数: 44269
num epochs / epoch数: 50
batch size per device / バッチサイズ: 3
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 2213450
steps: 0%| | 0/2213450 [00:00<?, ?it/s]
epoch 1/50
Traceback (most recent call last):
File "C:\Users\user\kohya_ss\sd-scripts\sdxl_train.py", line 792, in
train(args)
File "C:\Users\user\kohya_ss\sd-scripts\sdxl_train.py", line 570, in train
noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, *kwargs)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 680, in forward
return model_forward(args, kwargs)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 668, in call
return convert_to_fp32(self.model_forward(*args, kwargs))
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
return func(*args, *kwargs)
File "C:\Users\user\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1111, in forward
h = call_module(module, h, emb, context)
File "C:\Users\user\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1095, in call_module
x = layer(x, context)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, kwargs)
File "C:\Users\user\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 750, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "C:\Users\user\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 669, in forward
output = torch.utils.checkpoint.checkpoint(
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, kwargs)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch_dynamo\eval_frame.py", line 328, in _fn
return fn(*args, *kwargs)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch_dynamo\external_utils.py", line 17, in inner
return fn(args, kwargs)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py", line 451, in checkpoint
return CheckpointFunction.apply(function, preserve, args)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\autograd\function.py", line 539, in apply
return super().apply(args, kwargs) # type: ignore[misc]
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py", line 230, in forward
outputs = run_function(args)
File "C:\Users\user\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 665, in custom_forward
return func(inputs)
File "C:\Users\user\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 651, in forward_body
norm_hidden_states = self.norm2(hidden_states)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\nn\modules\normalization.py", line 196, in forward
return F.layer_norm(
File "C:\Users\user\kohya_ss\venv\lib\site-packages\torch\nn\functional.py", line 2543, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB. GPU 0 has a total capacty of 11.00 GiB of which 0 bytes is free. Of the allocated memory 16.98 GiB is allocated by PyTorch, and 696.75 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
steps: 0%| | 0/2213450 [04:54<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\user\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in
File "C:\Users\user\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
simple_launcher(args)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\user\kohya_ss\venv\Scripts\python.exe', 'C:\Users\user\kohya_ss/sd-scripts/sdxl_train.py', '--bucket_reso_steps=64', '--cache_latents', '--cache_latents_to_disk', '--caption_dropout_rate=0.05', '--caption_extension=.txt', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--gradient_checkpointing', '--learning_rate=3e-05', '--learning_rate_te1=1e-05', '--learning_rate_te2=1e-05', '--logging_dir=/workspace/kohya_ss/output/SDXL1.0-LoRa_Zeitgeist-Photographic-Style_by-AI_Characters-v2.0', '--lr_scheduler=constant', '--lr_scheduler_num_cycles=50', '--max_data_loader_n_workers=0', '--resolution=1024,1024', '--max_train_epochs=50', '--max_train_steps=2213334', '--min_snr_gamma=5', '--mixed_precision=fp16', '--optimizer_type=AdamW', '--output_dir=/workspace/kohya_ss/output/SDXL1.0-LoRa_Zeitgeist-Photographic-Style_by-AI_Characters-v2.0', '--output_name=LORAtest', '--pretrained_model_name_or_path=C:/Users/user/Desktop/Tools/SD auto forge/webui/models/Stable-diffusion/ponyDiffusionV6XL_v6StartWithThisOne.safetensors', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--save_precision=fp16', '--train_batch_size=3', '--train_data_dir=C:\Users\user\Desktop\LORAimg\Image', '--xformers']' returned non-zero exit status 1.