Open toonpeters opened 2 years ago
same problem here cuda out memory using 24G GPU
as far as I know this repo works on gpus with at least 12.5 GB of ram. It works fine on my 3090 ti , Im not sure it works using multiple gpus. I would try with just "--gpus 0,"
@toonpeters look here:
https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
Use the table below to choose the best flags based on your memory and speed requirements. Tested on Tesla T4 GPU.
fp16 | train_batch_size | gradient_accumulation_steps | gradient_checkpointing | use_8bit_adam | GB VRAM usage | Speed (it/s) |
---|---|---|---|---|---|---|
fp16 | 1 | 1 | TRUE | TRUE | 9.92 | 0.93 |
no | 1 | 1 | TRUE | TRUE | 10.08 | 0.42 |
fp16 | 2 | 1 | TRUE | TRUE | 10.4 | 0.66 |
fp16 | 1 | 1 | FALSE | TRUE | 11.17 | 1.14 |
no | 1 | 1 | FALSE | TRUE | 11.17 | 0.49 |
fp16 | 1 | 2 | TRUE | TRUE | 11.56 | 1 |
fp16 | 2 | 1 | FALSE | TRUE | 13.67 | 0.82 |
fp16 | 1 | 2 | FALSE | TRUE | 13.7 | 0.83 |
fp16 | 1 | 1 | TRUE | FALSE | 15.79 | 0.77 |
Hi, have you been able to run on 24G GPUS? I tried setting all batch-size to 1 , and adding those arguments in the command line, seems not working. python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume /v1-5-pruned.ckpt -n youtube6 --gpus 0,1 --data_root /youtube6/png --reg_data_root /dreambooth_data/class_man_images --class_word man --gradient_checkpointing True --use_8bit_adam True --fp16 fp16 --gradient_accumulation_steps 1 --train_batch_size 1
24G GPU 可以跑起来吗?
When running the main file for training
python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume sd-v1-4-full-ema.ckpt -n tr_job --gpus 0,1 --data_root training_images/ --reg_data_root regularization_images/person_ddim/ --class_word sks
the CUDA keeps running out of memory. What are the option apart from upgrading ram?I did most of the options above, but nothing works. Somebody other options?
OS: