bmaltais / kohya_ss

Apache License 2.0
9.53k stars 1.23k forks source link

cuda: Out of memory issue on rtx3090 (24GB vram) #3

Closed dikasterion closed 1 year ago

dikasterion commented 1 year ago

Hi, I succesfully managed to run your repo based on SD1.5 model. now I'm trying to run SD2.0 768 model but I have CUDA: out of memory error.

I have 23 train images (768*768) in 20_person folder under train_person folder. I tried lowering batch size and disabling cache latents to 0 here's the setting that I put into the powershell with venv(virtual environment)

variable values

$pretrained_model_name_or_path = "D:\stable-diffusion-webui\models\Stable-diffusion\768-v-ema.ckpt" $data_dir = "D:\kohya_ss\zwx_person_db\train_person" $logging_dir = "D:\kohya_ss\log" $output_dir = "D:\kohya_ss\output" $resolution = "768,768" $lr_scheduler="polynomial" $cache_latents = 0 # 1 = true, 0 = false

$image_num = Get-ChildItem $datadir -Recurse -File -Include .png, .jpg, *.webp | Measure-Object | %{$.Count}

Write-Output "image_num: $image_num"

$dataset_repeats = 2000 $learning_rate = 1e-6 $train_batch_size = 1 $epoch = 1 $save_every_n_epochs=1 $mixed_precision="bf16" $num_cpu_threads_per_process=6

You should not have to change values past this point

if ($cache_latents -eq 1) { $cache_latents_value="--cache_latents" } else { $cache_latents_value="" }

$repeats = $image_num $dataset_repeats $mts = [Math]::Ceiling($repeats / $train_batch_size $epoch)

Write-Output "Repeats: $repeats"

cd D:\kohya_ss .\venv\Scripts\activate

accelerate launch --num_cpu_threads_per_process $num_cpu_threads_per_process train_db_fixed.py --v2 --v_parameterization --pretrained_model_name_or_path=$pretrained_model_name_or_path --train_data_dir=$data_dir --output_dir=$output_dir --resolution=$resolution --train_batch_size=$train_batch_size --learning_rate=$learning_rate --max_train_steps=$mts --use_8bit_adam --xformers --mixed_precision=$mixed_precision $cache_latents_value --save_every_n_epochs=$save_every_n_epochs --logging_dir=$logging_dir --save_precision="fp16" --seed=494481440 --lr_scheduler=$lr_scheduler

Add the inference 768v yaml file along with the model for proper loading. Need to have the same name as model... Most likelly "last.yaml" in our case.

cp v2_inference\v2-inference-v.yaml $output_dir"\last.yaml"

and here's the error message I got

steps: 0%| | 0/46000 [00:00<?, ?it/s]epoch 1/100 Traceback (most recent call last): File "D:\kohya_ss[train_db_fixed.py](http://train_db_fixed.py/)", line 2098, in train(args) File "D:\kohya_ss[train_db_fixed.py](http://train_db_fixed.py/)", line 1948, in train optimizer.step() File "D:\kohya_ss\venv\lib\site-packages\accelerate[optimizer.py](http://optimizer.py/)", line 134, in step self.scaler.step(self.optimizer, closure) File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\amp[grad_scaler.py](http://grad_scaler.py/)", line 338, in step retval = self._maybe_opt_step(optimizer, optimizer_state, *args, kwargs) File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\amp[grad_scaler.py](http://grad_scaler.py/)", line 285, in _maybe_opt_step retval = optimizer.step(*args, *kwargs) File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\optim[lr_scheduler.py](http://lr_scheduler.py/)", line 65, in wrapper return wrapped(args, kwargs) File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\optim[optimizer.py](http://optimizer.py/)", line 113, in wrapper return func(*args, kwargs) File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd[grad_mode.py](http://grad_mode.py/)", line 27, in decorate_context return func(*args, *kwargs) File "D:\kohya_ss\venv\lib\site-packages\bitsandbytes\optim[optimizer.py](http://optimizer.py/)", line 263, in step self.init_state(group, p, gindex, pindex) File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd[grad_mode.py](http://grad_mode.py/)", line 27, in decorate_context return func(args, kwargs) File "D:\kohya_ss\venv\lib\site-packages\bitsandbytes\optim[optimizer.py](http://optimizer.py/)", line 401, in init_state state["state2"] = torch.zeros_like( RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 24.00 GiB total capacity; 10.13 GiB already allocated; 8.91 GiB free; 10.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF steps: 0%| | 0/46000 [00:30<?, ?it/s] Traceback (most recent call last): File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib[runpy.py](http://runpy.py/)", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib[runpy.py](http://runpy.py/)", line 86, in _run_code exec(code, run_globals) File "D:\kohya_ss\venv\Scripts\accelerate.exe[main.py](http://__main__.py/)", line 7, in File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands[accelerate_cli.py](http://accelerate_cli.py/)", line 45, in main args.func(args) File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands[launch.py](http://launch.py/)", line 1069, in launch_command simple_launcher(args) File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands[launch.py](http://launch.py/)", line 551, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\kohya_ss\venv\Scripts\python.exe', 'train_db_fixed.py', '--v2', '--v_parameterization', '--pretrained_model_name_or_path=D:\stable-diffusion-webui\models\Stable-diffusion\768-v-ema.ckpt', '--train_data_dir=D:\kohya_ss\zwx_person_db\train_person', '--output_dir=D:\kohya_ss\output', '--resolution=768,768', '--train_batch_size=1', '--learning_rate=1E-06', '--max_train_steps=46000', '--use_8bit_adam', '--xformers', '--mixed_precision=bf16', '--save_every_n_epochs=1', '--logging_dir=D:\kohya_ss\log', '--save_precision=fp16', '--seed=494481440', '--lr_scheduler=polynomial']' returned non-zero exit status 1.

dikasterion commented 1 year ago

oh I think it's running with global packages, even I've followed all the steps in the readme page... does anyone have any idea why...? I have activated ".\venv\Scripts\activate" and run the setting above with the virtual environment...

toyxyz commented 1 year ago

So how about using 8bit adam, xformers? You can reduce vram usage.

dikasterion commented 1 year ago

So how about using 8bit adam, xformers? You can reduce vram usage.

oh as the setting above, I enabled 8bit anam and xformers.

I finally managed to run v2 768 succesfully. I ran windows powershell with administrator then I can run it with 1 batch size. It fails with 2 or above but I'm content with what I got. thank you guys