NVIDIAGameWorks / kaolin-wisp

NVIDIA Kaolin Wisp is a PyTorch library powered by NVIDIA Kaolin Core to work with neural fields (including NeRFs, NGLOD, instant-ngp and VQAD).
Other
1.45k stars 132 forks source link

`--tracer.raymarch-type voxel` uses too much VRAM, which triggers OutOfMemoryError #193

Open barikata1984 opened 6 days ago

barikata1984 commented 6 days ago

While investigating #192, I noticed that --tracer.raymarch-type voxel triggers OutOfMemoryError as below

other traceback lines
...
  File "/home/atsushi/workspace/wisp211/wisp/tracers/packed_rf_tracer.py", line 130, in trace
    hit_ray_d = rays.dirs.index_select(0, ridx)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.15 GiB (GPU 0; 11.69 GiB total capacity; 10.22 GiB already allocated; 133.44 MiB free; 10.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
❯ nvidia-smi
Sat Jun 29 01:30:32 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti     Off |   00000000:01:00.0  On |                  N/A |
|  0%   40C    P8             14W /  285W |     848MiB /  12282MiB |     41%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1750      G   /usr/lib/xorg/Xorg                            416MiB |
|    0   N/A  N/A      1943    C+G   ...libexec/gnome-remote-desktop-daemon        195MiB |
|    0   N/A  N/A      1995      G   /usr/bin/gnome-shell                           98MiB |
|    0   N/A  N/A      5488      G   ...57,262144 --variations-seed-version        109MiB |
|    0   N/A  N/A      8436      G   /app/bin/wezterm-gui                            9MiB |
+-----------------------------------------------------------------------------------------+

As you can see, 4.15 GiB is tried to be allocated while 10.22 GiB are already used. I observed similar results regardless of whether an interactive app is loaded or not. I thought that simply other apps use pretty large VRAM and checked that usage by running nvidia-smi immediately after trying to train a nerf. As you can see, however, the result is less than 1GiB is used. My assumption is a nerf app tries to allocate quite large VRAM sequentially and fails at some point. Does anybody know a potential cause of this issue?

Thanks in advance!