CUDA error: out of memory

PeterouZh / CIPS-3D

3D-aware GANs based on NeRF (arXiv).

MIT License

610 stars 60 forks source link

CUDA error: out of memory #19

Closed longnhatne closed 2 years ago

longnhatne commented 2 years ago

Hi guy, There is an issue CUDA error: out of memory (even with batch size = 1) when I try to run training script with this command CUDA_VISIBLE_DEVICES=2 python -c "import sys; sys.path.append('./'); from exp.tests.test_cips3d import Testing_ffhq_exp; Testing_ffhq_exp().test_train_ffhq(debug=False)" --tl_opts batch_size 1 img_size 32 total_iters 80000

I try to run on V100 GPU with 32Gb mem. What should I do? Btw, really appreciate your work, a great paper. 👏

PeterouZh commented 2 years ago

How about using export CUDA_VISIBLE_DEVICES=2 ?

longnhatne commented 2 years ago

Still the same :((

PeterouZh commented 2 years ago

The error seems to be caused by .to(device). Please check whether the torch can use GPU via torch.cuda.is_available().

longnhatne commented 2 years ago

torch.cuda.is_available() returns True How much memory does the model take on your machine? I think it's not that much over 32Gb :((

PeterouZh commented 2 years ago

I think 32GB is enough to run the program. How about following the prompt of setting CUDA_LAUNCH_BLOCKING=1?

longnhatne commented 2 years ago

It seems that there is something wrong with my GPU, I use another one and it works!

Btw, there are many scripts here (ffhq_exp, ffhq_exp_v1...). What is the difference, and which one should I use?

PeterouZh commented 2 years ago

It seems that there is something wrong with my GPU, I use another one and it works!

Btw, there are many scripts here (ffhq_exp, ffhq_exp_v1...). What is the difference, and which one should I use?

Hi, I have added running instructions in the readme.