Closed MTNZLVSK closed 7 months ago
I have the same problem with batch size 1, etc.
Try lowering your bucket to 1024 for the maximum resolution
I have redistributed my hard-drive space and the problem just fixed by itself. If anyone have any idea on how to please do comment on this one.
I'm using a RTX 2080 Ti graphic card with 22G VRAM (second hand, self-modified). My previous LoRAs are trained in Sep'23 using a RTX3090 with 24G VRAM and it worked fine with fairly satisfying outcomes.
I've restarted training LoRA just this months and the old settings seems not working. I've searched for some solutions and only see some very generic guides on how to solve 'CUDA out of memory' error. It turns out that for 2 days I haven't found a satisfying configuration yet.
The confusion here is that even I have more than 10G of VRAM free, this 'CUDA OOM' error still happens. Not sure what's happening.
Following are the detailed logs from terminal running 'gui.bat'
prepare tokenizers update token length: 225 Using DreamBooth method. prepare images. found directory D:\fauxart projects\fauxart Hiroshi Sato\Training project\hiroshi_sato scene test\img\30_satoscene hiroshisato contains 48 image files found directory D:\fauxart projects\fauxart Hiroshi Sato\Training project\hiroshi_sato scene test\reg\1_hiroshisato contains 178 image files No caption file found for 178 images. Training will continue without captions for these images. If class token exists, it will be used. / 178枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。 D:\fauxart projects\fauxart Hiroshi Sato\Training project\hiroshi_sato scene test\reg\1_hiroshisato\001.jpg D:\fauxart projects\fauxart Hiroshi Sato\Training project\hiroshi_sato scene test\reg\1_hiroshisato\002.jpg D:\fauxart projects\fauxart Hiroshi Sato\Training project\hiroshi_sato scene test\reg\1_hiroshisato\003.jpg D:\fauxart projects\fauxart Hiroshi Sato\Training project\hiroshi_sato scene test\reg\1_hiroshisato\004.jpg D:\fauxart projects\fauxart Hiroshi Sato\Training project\hiroshi_sato scene test\reg\1_hiroshisato\005.jpg D:\fauxart projects\fauxart Hiroshi Sato\Training project\hiroshi_sato scene test\reg\1_hiroshisato\006.jpg... and 173 more 1440 train images with repeating. 178 reg images. [Dataset 0] batch_size: 2 resolution: (1024, 1024) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True
[Subset 0 of Dataset 0] image_dir: "D:\fauxart projects\fauxart Hiroshi Sato\Training project\hiroshi_sato scene test\img\30_satoscene hiroshisato" image_count: 48 num_repeats: 30 shuffle_caption: True keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: satoscene hiroshisato caption_extension: .txt
[Subset 1 of Dataset 0] image_dir: "D:\fauxart projects\fauxart Hiroshi Sato\Training project\hiroshi_sato scene test\reg\1_hiroshisato" image_count: 178 num_repeats: 1 shuffle_caption: True keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: True class_tokens: hiroshisato caption_extension: .txt
[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 226/226 [00:00<00:00, 11266.97it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (576, 1536), count: 8 bucket 1: resolution (640, 1344), count: 8 bucket 2: resolution (704, 1280), count: 38 bucket 3: resolution (704, 1408), count: 8 bucket 4: resolution (768, 1152), count: 8 bucket 5: resolution (768, 1216), count: 8 bucket 6: resolution (768, 1280), count: 8 bucket 7: resolution (768, 1344), count: 8 bucket 8: resolution (832, 1088), count: 132 bucket 9: resolution (832, 1152), count: 17 bucket 10: resolution (832, 1216), count: 56 bucket 11: resolution (896, 1024), count: 125 bucket 12: resolution (896, 1088), count: 135 bucket 13: resolution (896, 1152), count: 80 bucket 14: resolution (960, 960), count: 197 bucket 15: resolution (960, 1024), count: 79 bucket 16: resolution (960, 1088), count: 8 bucket 17: resolution (1024, 896), count: 220 bucket 18: resolution (1024, 960), count: 308 bucket 19: resolution (1024, 1024), count: 16 bucket 20: resolution (1088, 832), count: 306 bucket 21: resolution (1088, 896), count: 363 bucket 22: resolution (1152, 768), count: 8 bucket 23: resolution (1152, 832), count: 124 bucket 24: resolution (1152, 896), count: 92 bucket 25: resolution (1216, 768), count: 33 bucket 26: resolution (1216, 832), count: 202 bucket 27: resolution (1280, 768), count: 55 bucket 28: resolution (1344, 640), count: 8 bucket 29: resolution (1344, 704), count: 76 bucket 30: resolution (1408, 704), count: 70 bucket 31: resolution (1536, 576), count: 38 bucket 32: resolution (1984, 512), count: 38 mean ar error (without repeats): 0.019153709688548896 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: D:/stable-diffusion-webui/models/Stable-diffusion/dreamshaperXL_turboDpmppSDE.safetensors building U-Net loading U-Net from checkpoint U-Net:
building text encoders
loading text encoders from checkpoint
text encoder 1:
text encoder 2:
building VAE
loading VAE from checkpoint
VAE:
Enable memory efficient attention for U-Net
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
import network module: networks.lora
[Dataset 0]
caching latents.
checking cache validity...
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 226/226 [00:00<00:00, 112954.33it/s]
caching latents...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 226/226 [02:07<00:00, 1.77it/s]
create LoRA network. base dim (rank): 32, alpha: 32.0
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder 1:
create LoRA for Text Encoder 2:
create LoRA for Text Encoder: 264 modules.
create LoRA for U-Net: 722 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
use Prodigy optimizer | {}
enable full fp16 training.
running training / 学習開始
num train images repeats / 学習画像の数×繰り返し回数: 1440
num reg images / 正則化画像の数: 178
num batches per epoch / 1epochのバッチ数: 1444
num epochs / epoch数: 8
batch size per device / バッチサイズ: 2
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 11520
steps: 0%| | 0/11520 [00:00<?, ?it/s]
epoch 1/8
Traceback (most recent call last):
File "D:\kohya_ss GUI\kohya_ss\sdxl_train_network.py", line 189, in
trainer.train(args)
File "D:\kohya_ss GUI\kohya_ss\train_network.py", line 783, in train
noise_pred = self.call_unet(
File "D:\kohya_ss GUI\kohya_ss\sdxl_train_network.py", line 169, in call_unet
noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding)
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call( args, kwargs)
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 680, in forward
return model_forward(*args, *kwargs)
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 668, in call
return convert_to_fp32(self.model_forward(args, kwargs))
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast
return func(*args, kwargs)
File "D:\kohya_ss GUI\kohya_ss\library\sdxl_original_unet.py", line 1099, in forward
h = call_module(module, h, emb, context)
File "D:\kohya_ss GUI\kohya_ss\library\sdxl_original_unet.py", line 1088, in call_module
x = layer(x, emb)
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
File "D:\kohya_ss GUI\kohya_ss\library\sdxl_original_unet.py", line 343, in forward
x = torch.utils.checkpoint.checkpoint(create_custom_forward(self.forward_body), x, emb, use_reentrant=USE_REENTRANT)
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, args)
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\torch\autograd\function.py", line 506, in apply
return super().apply(args, kwargs) # type: ignore[misc]
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py", line 107, in forward
outputs = run_function(args)
File "D:\kohya_ss GUI\kohya_ss\library\sdxl_original_unet.py", line 339, in custom_forward
return func(inputs)
File "D:\kohya_ss GUI\kohya_ss\library\sdxl_original_unet.py", line 326, in forward_body
h = self.in_layers(x)
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
input = module(input)
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\kohya_ss GUI\kohya_ss\library\sdxl_original_unet.py", line 289, in forward
return super().forward(x)
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\torch\nn\modules\normalization.py", line 273, in forward
return F.group_norm(
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\torch\nn\functional.py", line 2530, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 22.00 GiB total capacity; 8.36 GiB already allocated; 12.24 GiB free; 8.54 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
steps: 0%| | 0/11520 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\kohya_ss GUI\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
simple_launcher(args)
File "D:\kohya_ss GUI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\kohya_ss GUI\kohya_ss\venv\Scripts\python.exe', './sdxl_train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=D:/stable-diffusion-webui/models/Stable-diffusion/dreamshaperXL_turboDpmppSDE.safetensors', '--train_data_dir=D:/fauxart projects/fauxart Hiroshi Sato/Training project/hiroshi_sato scene test/img', '--reg_data_dir=D:/fauxart projects/fauxart Hiroshi Sato/Training project/hiroshi_sato scene test/reg', '--resolution=1024,1024', '--output_dir=D:/fauxart projects/fauxart Hiroshi Sato/Training project/hiroshi_sato scene test/model', '--logging_dir=D:/fauxart projects/fauxart Hiroshi Sato/Training project/hiroshi_sato scene test/log', '--network_alpha=32', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=1.0', '--unet_lr=1.0', '--network_dim=32', '--output_name=hiroshisato_scene', '--lr_scheduler_num_cycles=8', '--no_half_vae', '--learning_rate=1.0', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=11520', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=Prodigy', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--max_token_length=225', '--bucket_reso_steps=64', '--mem_eff_attn', '--shuffle_caption', '--gradient_checkpointing', '--full_fp16', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0', '--sample_sampler=k_dpm_2', '--sample_prompts=D:/fauxart projects/fauxart Hiroshi Sato/Training project/hiroshi_sato scene test/model\sample\prompt.txt', '--sample_every_n_epochs=1']' returned non-zero exit status 1.