CUDA out of memory for SFNONet

NVIDIA / modulus-makani

Massively parallel training of machine-learning based weather and climate models

Other

227 stars 33 forks source link

CUDA out of memory for SFNONet #14

Open alv128 opened 3 days ago

alv128 commented 3 days ago

Hi there,

Thank you for the repo. I am trying to run the pretrained SFNONet from Makani for inference on some sample data, and I keep getting CUDA out of memory errors. I am using a single Tesla T4 GPU with 16GB VRAM. The model itself only takes around 3GB of memory and the data itself only around 300MB, however once the inference gets to calling the inverse Fourier transform, in partcular torch.fft.irfft, it blows up the memory. Is this expected behaviour? I am also setting torch.backends.cuda.cufft_plan_cache.max_size to a small number, but the error keeps happening. I am also mainly using makani/models/model_package.py to perform the inference.

bonevbs commented 3 days ago

Hi @alv128 are you running this in torch.no_grad() mode? What consumes a lot of memory are the activations which are required for the gradient computation. For inference you do not need the gradients and you should therefore not keep them by using the above mode. PyTorch actually recommends torch.inference_mode(), which also disables the storing of activations.

alv128 commented 3 days ago

Hi @bonevbs, thank you very much for the quick reply. I was using torch.no_grad() before, and now I also tried it with torch.inference_mode(), but it did not make a difference. I am also using pytorch version 2.4.1 with CUDA 12.4. It is still running out of memory for the computation of the Fourier transform it seems.

bonevbs commented 3 days ago

Hmm this is unfortunate - it might indicate that the model is still too big for 16GB of VRAM. You could try using bf16 to see whether it fits. Is this a single inference step or more?

alv128 commented 3 days ago

It is a single inference step. Also the T4 GPU does not support bf16 unfortunately. It is possible that the model cannot be run on 16GB of VRAM, however I am still surprised as the model itself only occupies around 3GB, and once the inference step begins, memory fills up very quickly. For reference, I am using the sfno_73ch_small model.