"CUDA error: out of memory" using RTX 2080Ti with 11G of VRAM

SkymillRobbie commented 1 year ago

Currently trying to train a lora using RTX 2080Ti with 11G of VRAM. I'm only using 10 images and they are all 512x512 resolution.

Whenever I run it though I get this error and it never completes.

Any idea how I can resolve this?

`prepare tokenizer prepare images. found directory M:\Games\Rax\Stable Diffusion\Kohya\Process Lora\zeraora lora\img\150_zeraora contains 10 image files 1500 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "M:\Games\Rax\Stable Diffusion\Kohya\Process Lora\zeraora lora\img\150_zeraora" image_count: 10 num_repeats: 150 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: zeraora caption_extension: .txt

[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 1428.38it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (512, 512), count: 1500 mean ar error (without repeats): 0.0 prepare accelerator Using accelerator 0.15.0 or above. load StableDiffusion checkpoint loading u-net: loading vae: loading text encoder: Replace CrossAttention.forward to use xformers [Dataset 0] caching latents. 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:02<00:00, 3.77it/s] prepare optimizer, data loader etc. use AdamW optimizer | {} running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 1500 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 750 num epochs / epoch数: 1 batch size per device / バッチサイズ: 2 total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ（並列学習、勾配合計含む）: 2 gradient ccumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 750 steps: 0%| | 0/750 [00:00<?, ?it/s]epoch 1/1 Traceback (most recent call last): File "M:\Games\Rax\Stable Diffusion\Kohya\kohya_ss\train_db.py", line 429, in train(args) File "M:\Games\Rax\Stable Diffusion\Kohya\kohya_ss\train_db.py", line 317, in train optimizer.step() File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\optimizer.py", line 134, in step self.scaler.step(self.optimizer, closure) File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 338, in step retval = self._maybe_opt_step(optimizer, optimizer_state, args, kwargs) File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 285, in _maybe_opt_step retval = optimizer.step(*args, *kwargs) File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\optim\lr_scheduler.py", line 65, in wrapper return wrapped(args, kwargs) File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\optim\optimizer.py", line 113, in wrapper return func(*args, *kwargs) File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(args, **kwargs) File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\optim\adamw.py", line 148, in step state['exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. steps: 0%| | 0/750 [00:08<?, ?it/s] Traceback (most recent call last): File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\Scripts\accelerate.exe__main__.py", line 7, in File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "C:\Users\Redd\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\Users\Redd\AppData\Local\Programs\Python\Python310\python.exe', 'train_db.py', '--enable_bucket', '--pretrained_model_name_or_path=M:/Games/Rax/Stable Diffusion/stable-diffusion-webui/models/Stable-diffusion/AbyssOrangeMix2_hard.safetensors', '--train_data_dir=M:\Games\Rax\Stable Diffusion\Kohya\Process Lora\zeraora lora\img', '--resolution=512,512', '--output_dir=M:\Games\Rax\Stable Diffusion\Kohya\Process Lora\zeraora lora\model', '--logging_dir=M:\Games\Rax\Stable Diffusion\Kohya\Process Lora\zeraora lora\log', '--save_model_as=safetensors', '--output_name=zeraora', '--max_data_loader_n_workers=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=750', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1. `

bmaltais commented 1 year ago

Try using AdamW8bit if your card support it... and maybe going to a train batch size of 1.

12019saccount commented 1 year ago

I set it as AdamW8bit& batch size of 1,But it still have same promblem,sorry.

bmaltais commented 1 year ago

Have you tried with AdamW instead?

SkymillRobbie commented 1 year ago

My mistake! I realized I was actually using the "Dreambooth" tab instead of the "Dreambooth LoRA" tab. It works great now!

bmaltais / kohya_ss

"CUDA error: out of memory" using RTX 2080Ti with 11G of VRAM #601