Open Tuxius opened 2 years ago
you can run it on windows wsl2 https://www.youtube.com/watch?v=w6PTviOCYQY&t=15s
Is it possible to run the training on 11 GB VRAM?
I was getting a lot of out of memory on 24GB 3090 -- I ended up using a bigger server -- and saw it would consume upto 28GB RAM, went upto 30GB at one point.
Could be possible some config needs to be tweaked while running, not sure 🤷🏼♂️
@ChinaArvin : Thank you, yes following these instruction it works now nicely well below 24 GB. However, having to use WSL feels like a workaround, even though I enjoy Linux command line. It should be possible to get this running under native Windows?
@wyang22: If you sacrifice some settings even below 11GB under WSL are possible, just follow the instructions of the video
I finally ended up using this:
https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
It does indeed not work with a 3090 on windows 11, but fine under WSL on the same machine, same (default) config. Must be a bug on windows then..
@wyang22 see https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth you have several configuration there:
Use the table below to choose the best flags based on your memory and speed requirements. Tested on Tesla T4 GPU.
fp16 | train_batch_size | gradient_accumulation_steps | gradient_checkpointing | use_8bit_adam | GB VRAM usage | Speed (it/s) |
---|---|---|---|---|---|---|
fp16 | 1 | 1 | TRUE | TRUE | 9.92 | 0.93 |
no | 1 | 1 | TRUE | TRUE | 10.08 | 0.42 |
fp16 | 2 | 1 | TRUE | TRUE | 10.4 | 0.66 |
fp16 | 1 | 1 | FALSE | TRUE | 11.17 | 1.14 |
no | 1 | 1 | FALSE | TRUE | 11.17 | 0.49 |
fp16 | 1 | 2 | TRUE | TRUE | 11.56 | 1 |
fp16 | 2 | 1 | FALSE | TRUE | 13.67 | 0.82 |
fp16 | 1 | 2 | FALSE | TRUE | 13.7 | 0.83 |
fp16 | 1 | 1 | TRUE | FALSE | 15.79 | 0.77 |
any flag I use, I always get the CUDA out of memory error. How are you all using this? Can anyone post an example? I'm trying it on a 4090 with 24GB
I'm also having OOM errors with a 3090 with 24GB. Batch size set to 1, I even set the precision
flag on the Trainer
to 16
.
Did anyone ever find a solution? I am also getting this error on a 3090ti
any flag I use, I always get the CUDA out of memory error. How are you all using this? Can anyone post an example? I'm trying it on a 4090 with 24GB
I haven't tried with this repo, but if you are trying to train a 768 model, and don't have xformers installed correctly it will go OOM. The 768 model training hovers around 21GB VRAM. I think the 512 models should train fine.
I finally ended up using this:
https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
Yesss!! Finally yeah this worked. I am running on 8GB and finally got it to train using the info in this section: https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth#training-on-a-8-gb-gpu
DeepSpeed was the final piece that I needed.
Best of luck!
@wyang22 see https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth you have several configuration there:
Use the table below to choose the best flags based on your memory and speed requirements. Tested on Tesla T4 GPU.
@titusfx Where can I edit these configurations or put that in?
Following 1:1 the instructions I get an out of Memory despite having 24 GB VRAM available:
I tried some changes in
v1-finetune_unfrozen.yaml
(e.g. num_workers: from 2 to 1), but no improvement.Has anybody successfully run this under Windows with 24 GB VRAM?