XavierXiao / Dreambooth-Stable-Diffusion

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
MIT License
7.6k stars 795 forks source link

CUDA out of memory, how to adjust batch size #70

Open toonpeters opened 2 years ago

toonpeters commented 2 years ago

When running the main file for training python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume sd-v1-4-full-ema.ckpt -n tr_job --gpus 0,1 --data_root training_images/ --reg_data_root regularization_images/person_ddim/ --class_word sks the CUDA keeps running out of memory. What are the option apart from upgrading ram?

I did most of the options above, but nothing works. Somebody other options?

OS:

orydatadudes commented 2 years ago

same problem here cuda out memory using 24G GPU

Fikxzer commented 2 years ago

as far as I know this repo works on gpus with at least 12.5 GB of ram. It works fine on my 3090 ti , Im not sure it works using multiple gpus. I would try with just "--gpus 0,"

titusfx commented 1 year ago

@toonpeters look here:

https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

Use the table below to choose the best flags based on your memory and speed requirements. Tested on Tesla T4 GPU.

fp16 train_batch_size gradient_accumulation_steps gradient_checkpointing use_8bit_adam GB VRAM usage Speed (it/s)
fp16 1 1 TRUE TRUE 9.92 0.93
no 1 1 TRUE TRUE 10.08 0.42
fp16 2 1 TRUE TRUE 10.4 0.66
fp16 1 1 FALSE TRUE 11.17 1.14
no 1 1 FALSE TRUE 11.17 0.49
fp16 1 2 TRUE TRUE 11.56 1
fp16 2 1 FALSE TRUE 13.67 0.82
fp16 1 2 FALSE TRUE 13.7 0.83
fp16 1 1 TRUE FALSE 15.79 0.77
wyiguanw commented 7 months ago

Hi, have you been able to run on 24G GPUS? I tried setting all batch-size to 1 , and adding those arguments in the command line, seems not working. python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume /v1-5-pruned.ckpt -n youtube6 --gpus 0,1 --data_root /youtube6/png --reg_data_root /dreambooth_data/class_man_images --class_word man --gradient_checkpointing True --use_8bit_adam True --fp16 fp16 --gradient_accumulation_steps 1 --train_batch_size 1

Zhangpei226 commented 5 months ago

24G GPU 可以跑起来吗?