running training / 学習開始
num examples / サンプル数: 6420
num batches per epoch / 1epochのバッチ数: 6420
num epochs / epoch数: 1
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 3000
steps: 0%| | 0/3000 [00:00<?, ?it/s]
epoch 1/1
Traceback (most recent call last):
File "/root/lanyun-tmp/webui/kohya/kohya_ss/sd-scripts/sdxl_train.py", line 818, in
train(args)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/sd-scripts/sdxl_train.py", line 628, in train
optimizer.step()
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/accelerate/optimizer.py", line 132, in step
self.scaler.step(self.optimizer, closure)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 416, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, *args, kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 315, in _maybe_opt_step
retval = optimizer.step(*args, *kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/accelerate/optimizer.py", line 185, in patched_step
return method(args, kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, *kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 297, in step
self.init_state(group, p, gindex, pindex)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 478, in init_state
state["state1"] = self.get_state_buffer(p, dtype=torch.uint8)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 337, in get_state_buffer
return torch.zeros_like(p, dtype=dtype, device=p.device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 23.64 GiB of which 14.81 MiB is free. Process 378976 has 23.62 GiB memory in use. Of the allocated memory 22.68 GiB is allocated by PyTorch, and 464.17 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
steps: 0%| | 0/3000 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/bin/accelerate", line 8, in
sys.exit(main())
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
simple_launcher(args)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/bin/python3', '/root/lanyun-tmp/webui/kohya/kohya_ss/sd-scripts/sdxl_train.py', '--config_file', '/root/lanyun-tmp/webui/config_dreambooth-20240714-125801.toml']' returned non-zero exit status 1.
12:58:59-972829 INFO Training has ended.
running training / 学習開始 num examples / サンプル数: 6420 num batches per epoch / 1epochのバッチ数: 6420 num epochs / epoch数: 1 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 3000 steps: 0%| | 0/3000 [00:00<?, ?it/s] epoch 1/1 Traceback (most recent call last): File "/root/lanyun-tmp/webui/kohya/kohya_ss/sd-scripts/sdxl_train.py", line 818, in
train(args)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/sd-scripts/sdxl_train.py", line 628, in train
optimizer.step()
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/accelerate/optimizer.py", line 132, in step
self.scaler.step(self.optimizer, closure)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 416, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, *args, kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 315, in _maybe_opt_step
retval = optimizer.step(*args, *kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/accelerate/optimizer.py", line 185, in patched_step
return method(args, kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, *kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 297, in step
self.init_state(group, p, gindex, pindex)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 478, in init_state
state["state1"] = self.get_state_buffer(p, dtype=torch.uint8)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 337, in get_state_buffer
return torch.zeros_like(p, dtype=dtype, device=p.device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 23.64 GiB of which 14.81 MiB is free. Process 378976 has 23.62 GiB memory in use. Of the allocated memory 22.68 GiB is allocated by PyTorch, and 464.17 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
steps: 0%| | 0/3000 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/bin/accelerate", line 8, in
sys.exit(main())
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
simple_launcher(args)
File "/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/lanyun-tmp/webui/kohya/kohya_ss/myenv/bin/python3', '/root/lanyun-tmp/webui/kohya/kohya_ss/sd-scripts/sdxl_train.py', '--config_file', '/root/lanyun-tmp/webui/config_dreambooth-20240714-125801.toml']' returned non-zero exit status 1.
12:58:59-972829 INFO Training has ended.