Xharlie / pointnerf

Point-NeRF: Point-based Neural Radiance Fields
Other
1.1k stars 127 forks source link

How large the GPU memory should be to train? #99

Closed CTouch closed 11 months ago

CTouch commented 11 months ago

I failed to train the scannet scene241 at GeForce RTX 3080 10GB. I run the command bash dev_scripts/w_scannet_etf/scene241.sh without the checkpoints/scannet/scene241/* provided. Then run out of gpu memory

Traceback (most recent call last):
  File "train_ft.py", line 1091, in <module>
    main()
  File "train_ft.py", line 947, in main
    model.optimize_parameters(total_steps=total_steps)
  File "/home/touch/PycharmProjects/myfork/pointnerf/run/../models/neural_points_volumetric_model.py", line 217, in optimize_parameters
    self.backward(total_steps)
  File "/home/touch/PycharmProjects/myfork/pointnerf/run/../models/mvs_points_volumetric_model.py", line 104, in backward
    self.loss_total.backward()
  File "/home/touch/miniconda3/envs/pointnerf/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/touch/miniconda3/envs/pointnerf/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 588.00 MiB (GPU 0; 9.77 GiB total capacity; 5.89 GiB already allocated; 247.06 MiB free; 7.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
end loading