bmaltais / kohya_ss

Apache License 2.0
9.69k stars 1.25k forks source link

RuntimeError: CUDA out of memory. Tried to allocate 146.00 MiB (GPU 0; 8.00 GiB total capacity; 7.21 GiB already allocated; 0 bytes free; 7.32 GiB reserved in total by PyTorch) #623

Closed Cynaxia closed 1 year ago

Cynaxia commented 1 year ago

Folder 100_Cynaxia : 1500 steps max_train_steps = 1500 stop_text_encoder_training = 0 lr_warmup_steps = 0 accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --pretrained_model_name_or_path="E:/stable-diffusion/stable-diffusion-webui/models/Stable-diffusion/WaifuDiffusion.ckpt" --train_data_dir="E:/LORA Training/Cynaxia Live2D w Captions/Cynaxia Live2D LoRA/image" --resolution=512,512 --output_dir="E:/LORA Training/Cynaxia Live2D w Captions/Cynaxia Live2D LoRA/model" --logging_dir="E:/LORA Training/Cynaxia Live2D w Captions/Cynaxia Live2D LoRA/model" --save_model_as=safetensors --output_name="Cynaxialive2d" --max_data_loader_n_workers="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="1500" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --mem_eff_attn --gradient_checkpointing --xformers --bucket_no_upscale prepare tokenizer prepare images. found directory E:\LORA Training\Cynaxia Live2D w Captions\Cynaxia Live2D LoRA\image\100_Cynaxia contains 15 image files1500 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (512, 512) enable_bucket: False

[Subset 0 of Dataset 0] image_dir: "E:\LORA Training\Cynaxia Live2D w Captions\Cynaxia Live2D LoRA\image\100_Cynaxia" image_count: 15 num_repeats: 100 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Cynaxia caption_extension: .txt

[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 2499.19it/s] prepare dataset prepare accelerator Using accelerator 0.15.0 or above. load StableDiffusion checkpoint loading u-net: loading vae: loading text encoder: Replace CrossAttention.forward to use FlashAttention (not xformers) [Dataset 0] caching latents. 100%|██████████████████████████████████████████████████████████████████████████████████| 15/15 [00:03<00:00, 3.79it/s] prepare optimizer, data loader etc. use AdamW optimizer | {} running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 1500 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 1500 num epochs / epoch数: 1 batch size per device / バッチサイズ: 1 total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): 1 gradient ccumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 1500 steps: 0%| | 0/1500 [00:00<?, ?it/s]epoch 1/1 Traceback (most recent call last): File "E:\Kohya\kohya_ss\train_db.py", line 435, in train(args) File "E:\Kohya\kohya_ss\train_db.py", line 315, in train accelerator.backward(loss) File "E:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 1314, in backward self.scaler.scale(loss).backward(**kwargs) File "E:\Kohya\kohya_ss\venv\lib\site-packages\torch_tensor.py", line 396, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "E:\Kohya\kohya_ss\venv\lib\site-packages\torch\autograd__init.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: CUDA out of memory. Tried to allocate 146.00 MiB (GPU 0; 8.00 GiB total capacity; 7.21 GiB already allocated; 0 bytes free; 7.32 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF steps: 0%| | 0/1500 [00:05<?, ?it/s] Traceback (most recent call last): File "C:\Users\Cynax\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Cynax\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\Kohya\kohya_ss\venv\Scripts\accelerate.exe\main__.py", line 7, in File "E:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "E:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "E:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\Kohya\kohya_ss\venv\Scripts\python.exe', 'train_db.py', '--pretrained_model_name_or_path=E:/stable-diffusion/stable-diffusion-webui/models/Stable-diffusion/WaifuDiffusion.ckpt', '--train_data_dir=E:/LORA Training/Cynaxia Live2D w Captions/Cynaxia Live2D LoRA/image', '--resolution=512,512', '--output_dir=E:/LORA Training/Cynaxia Live2D w Captions/Cynaxia Live2D LoRA/model', '--logging_dir=E:/LORA Training/Cynaxia Live2D w Captions/Cynaxia Live2D LoRA/model', '--save_model_as=safetensors', '--output_name=Cynaxialive2d', '--max_data_loader_n_workers=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=1500', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--mem_eff_attn', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

Struggling to fix this issue, I've managed to run 768x768 training the first time , it completed 15%ish , and I've closed CMD because I didn't have enough time at that point to wait for it When I launched exactly the same operation later , cuda memory error appeared and training won't even start Tried going for lower res 512x512 , same thing won't even start I've read posts on stackoverflow suggesting running command import torch torch.cuda.empty_cache() But I'm a newbie and don't really know how/where to do it any suggestions/help ? Thanks in advance !

electricbee commented 1 year ago

Looks like you're trying to train a LORA but accidentally started the Dreambooth trainer instead I have done the same thing an embarrassing amount of times

Cynaxia commented 1 year ago

Looks like you're trying to train a LORA but accidentally started the Dreambooth trainer instead I have done the same thing an embarrassing amount of times

Yeah apparently so, late night rush to fix the problem did it's thing (: Made sure to run LORA this time and it worked first try , thanks for your help !