NVIDIAGameWorks / kaolin-wisp

NVIDIA Kaolin Wisp is a PyTorch library powered by NVIDIA Kaolin Core to work with neural fields (including NeRFs, NGLOD, instant-ngp and VQAD).
Other
1.45k stars 132 forks source link

`uniform_sample_cuda` fails when `spc_render.unbatched_raytrace` return empty tensors for `ridx`, `pidx`, and `depth` #192

Open barikata1984 opened 6 days ago

barikata1984 commented 6 days ago

Hi, everyone

I am trying to switch my env from torch1.13.1 and cuda117 to torch.2.1.1 and cuda121. After installation, I trained a nerf with --tracer.raymarch-type uniform but it failed with an error message like below:

[i] Using PYGLFW_IMGUI (GL 3.3)
2024-06-29 01:08:12,092|    INFO| [i] Using PYGLFW_IMGUI (GL 3.3)
[i] Running at 60 frames/second
2024-06-29 01:08:12,110|    INFO| [i] Running at 60 frames/second
rays.origins=tensor([[6., 6., 6.],
        [6., 6., 6.],
        [6., 6., 6.],
        ...,
        [6., 6., 6.],
        [6., 6., 6.],
        [6., 6., 6.]], device='cuda:0')
((rays.origins < -1) | (1 < rays.origins)).all(dim=1)=([True, True, True,  ..., True, True, True], device='cuda:0')
ridx=tensor([], device='cuda:0', dtype=torch.int32)
pidx=tensor([], device='cuda:0', dtype=torch.int32)
depth=tensor([], device='cuda:0', size=(0, 2))
non_zero_elements=tensor([], device='cuda:0', dtype=torch.bool)
filtered_ridx.shape=torch.Size([0])
filtered_depth.shape=torch.Size([0, 2])
insum.shape=torch.Size([0])
Traceback (most recent call last):
  File "/home/atsushi/workspace/wisp211/app/nerf/main_nerf.py", line 131, in <module>
    app.run()  # Run in interactive mode
  File "/home/atsushi/workspace/wisp211/wisp/renderer/app/wisp_app.py", line 267, in run
    app.run()   # App clock should always run as frequently as possible (background tasks should not be limited)
...
other traceback lines
...
  File "/home/atsushi/workspace/wisp211/wisp/tracers/base_tracer.py", line 161, in forward
    rb = self.trace(nef, rays, requested_channels, requested_extra_channels, **input_args)
  File "/home/atsushi/workspace/wisp211/wisp/tracers/packed_rf_tracer.py", line 117, in trace
    raymarch_results = nef.grid.raymarch(rays,
  File "/home/atsushi/workspace/wisp211/wisp/models/grids/hash_grid.py", line 240, in raymarch
    return self.blas.raymarch(rays, raymarch_type=raymarch_type, num_samples=num_samples, level=self.blas.max_level)
  File "/home/atsushi/workspace/wisp211/wisp/accelstructs/octree_as.py", line 436, in raymarch
    raymarch_results = self._raymarch_uniform(rays=rays, num_samples=num_samples, level=level)
  File "/home/atsushi/workspace/wisp211/wisp/accelstructs/octree_as.py", line 365, in _raymarch_uniform
    results = wisp_C.ops.uniform_sample_cuda(scale, filtered_ridx.contiguous(), filtered_depth, insum)
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I looked into _raymarch_uniform and found out that uniform_sample_cuda fails when spc_render.unbatched_raytrace returns empty tensors for ridx, pidx, and depth, as you can see in the earlier half of the error message. I also confirmed that ridx, pidx, and depth can also be empty with torch1.13.1 and cuda117 while I did not experience that error. Besides, I faced the error with torch1.13.1 and cuda118. So, I believe that uniform_sample_cuda's behaviour differs between cuda117 and later versions. If I had an experience in Cuda coding, I could debug the method. But I do not know how to code Cuda right now. So, does anybody debug it?

Thanks in advance!