bmaltais / kohya_ss

Apache License 2.0
9.54k stars 1.23k forks source link

OutOfMemoryError: CUDA out of memory. #1202

Closed GeorgeMonkey1 closed 8 months ago

GeorgeMonkey1 commented 1 year ago

Hello,

10 days ago I made my first LORA with Kohya_ss. It worked very well, result is nice. I tried a 2000 steps with 20 pictures, at 768x768, 10 epochs with mixed precision set to No, because when I do set fp16, I get Loss:NaN problem and the result is bad.

Today, I wanted to make my second Lora, made the exact same path, 20 pictures at 768x768, mixed precision set to No, 10 epochs but today it doesn't work, it stops at like 5 steps/2000 then crashes with OutOfMemoryError: CUDA out of memory.

Here is the full error : OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 5.24 GiB already allocated; 0 bytes free; 5.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF steps: 0%| | 1/2100 [00:03<2:05:34, 3.59s/it, loss=0.473]

I don't know if I need to put everything in the cmd or if it might be enough ?

I already tried some solutions I found here and on reddit, tried 512x512 instead of 768x768, tried the exact same lora I already completed, tried no xformers, tried putting a cudaall.dll in bitsandwindows folder, tried reinstalling, but none worked.

I have a 1660 Super 6gb VRAM by the way.

Any idea ? Thanks in advance for any answer !

Jackfritt commented 1 year ago

I only have this problems when other processes also use the gpu. Try to close all other running processes to make sure only kohya_ss runs.

GeorgeMonkey1 commented 1 year ago

I only have this problems when other processes also use the gpu. Try to close all other running processes to make sure only kohya_ss runs.

Hello, thanks for your answer, I tried it, basically only had Edge running, but this didn't help. Do you have any other ideas ?

Jackfritt commented 1 year ago

Sorry not working with windows...gl

5nail000 commented 1 year ago

Hi! For two days, I struggled with Runpod as the training wouldn't start... I hadn't used my scripts for a couple of months, and suddenly everything stopped working with an OUT OF MEMORY ERROR. I struggled and struggled... I tried different ways of installing kohya_ss with both the 'ashleykza/stable-diffusion-webui:1.5.3' and 'runpod/stable-diffusion:web-ui-9.1.0' builds, but all in vain.

BUT!!!! ))))) By accident, everything started working when I tried using the A4000 GPU instead of the RTX 3xxx / 4xxx! That's something! @bmaltais Please take note, perhaps something needs fixing...

Here are the steps I took to set it up:

apt update
cd /workspace
git clone https://github.com/bmaltais/kohya_ss.git
cd /workspace/kohya_ss
git checkout dev2
./setup-runpod.sh
source /workspace/kohya_ss/venv/bin/activate
accelerate launch --num_cpu_threads_per_process 1 /workspace/kohya_ss/train_network.py --pretrained_model_name_or_path /workspace/stable-diffusion-webui/models/Stable-diffusion/Reliberate.safetensors --train_data_dir /workspace/inputs --reg_data_dir /workspace/reg_images/ --output_dir /workspace/outputs/ --dataset_config /workspace/dataset_config.toml --output_name word --learning_rate 1e-4 --resolution 512 --max_train_epochs 10 --save_every_n_epochs 1 --train_batch_size 1 --caption_extension none --network_module networks.lora --network_dim 192 --network_alpha 192 --save_model_as safetensors --save_precision fp16 --mixed_precision fp16 --use_8bit_adam --cache_latents --text_encoder_lr 0.00005 --unet_lr 0.0001 --max_data_loader_n_workers 0 --bucket_no_upscale
Loadus commented 1 year ago

I have a 1660 Super 6gb VRAM by the way.

6GB is way below the line for 768px training. You can do it, but I wouldn't recommend it. There are ways to squeeze more into the VRAM, like using xformers, 8bit Adam etc., there are tutorial videos on YT that can help you get the maximum out of your card.

I struggled with training even with 8GB and went and bought 3060 with 12GB just because I got tired of fighting the memory limits.