Getting cuda memory issue. How to resolve this error?

deeprobo-dev commented 2 years ago

Hi I am getting "CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 5.80 GiB total capacity; 4.51 GiB already allocated; 34.31 MiB free; 4.57 GiB reserved in total by PyTorch). on running train_transformed_rays.py script."

Please find the below details of error:

before signal registration
after registration
starting data loading
Done with data loading
done loading data
loading GT background to condition on
bg shape torch.Size([512, 512, 3])
should be  torch.Size([512, 512, 3])
initialized latent codes with shape 56 X 32
computing boundix boxes probability maps
Starting loop
  0%|                                                                                                                        | 0/1000000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train_transformed_rays.py", line 608, in <module>
    main()
  File "train_transformed_rays.py", line 342, in main
    rgb_coarse, _, _, rgb_fine, _, _, weights = run_one_iter_of_nerf(
  File "/home/santu/4D-Facial-Avatars/nerface_code/nerf-pytorch/nerf/train_utils.py", line 228, in run_one_iter_of_nerf
    pred = [
  File "/home/santu/4D-Facial-Avatars/nerface_code/nerf-pytorch/nerf/train_utils.py", line 229, in <listcomp>
    predict_and_render_radiance(
  File "/home/santu/4D-Facial-Avatars/nerface_code/nerf-pytorch/nerf/train_utils.py", line 129, in predict_and_render_radiance
    radiance_field = run_network(
  File "/home/santu/4D-Facial-Avatars/nerface_code/nerf-pytorch/nerf/train_utils.py", line 24, in run_network
    preds = [network_fn(batch, expressions, latent_code) for batch in batches]
  File "/home/santu/4D-Facial-Avatars/nerface_code/nerf-pytorch/nerf/train_utils.py", line 24, in <listcomp>
    preds = [network_fn(batch, expressions, latent_code) for batch in batches]
  File "/home/santu/miniconda3/envs/nerf2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/santu/4D-Facial-Avatars/nerface_code/nerf-pytorch/nerf/models.py", line 249, in forward
    x = self.relu(x)
  File "/home/santu/miniconda3/envs/nerf2/lib/python3.8/site-packages/torch/nn/functional.py", line 1063, in relu
    result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 5.80 GiB total capacity; 4.58 GiB already allocated; 46.19 MiB free; 4.64 GiB reserved in total by PyTorch)

Note: My system configuration- i7, 1TB SSD, 16GB RAM and gtx2060 graphics. Can you please confirm if system configuration is the problem or anything else and if so then what is the minimum system requirement?

gafniguy commented 2 years ago

The configs are designed to train on GPUs with more memory, like 2080's. Since your GPU has half the memory you should reduce the load by about half. You can do so by changing in the config file, the values of nerf.train.num_random_rays (that's how many rays per image in an iteration) and nerf.train.chunksize (it divides the MLP queries into chunks, you can halve that without reducing the number of pixels to shoot rays through). Play around with these until it fits in your memory.

Do so for nerf.eval.chunksize if necessary

deeprobo-dev commented 2 years ago

Thanks a lot. Now able to run it.

gafniguy / 4D-Facial-Avatars

Getting cuda memory issue. How to resolve this error? #40