`uniform_sample_cuda` fails when `spc_render.unbatched_raytrace` return empty tensors for `ridx`, `pidx`, and `depth`

Hi, everyone

I am trying to switch my env from torch1.13.1 and cuda117 to torch.2.1.1 and cuda121. After installation, I trained a nerf with --tracer.raymarch-type uniform but it failed with an error message like below:

[i] Using PYGLFW_IMGUI (GL 3.3)
2024-06-29 01:08:12,092|    INFO| [i] Using PYGLFW_IMGUI (GL 3.3)
[i] Running at 60 frames/second
2024-06-29 01:08:12,110|    INFO| [i] Running at 60 frames/second
rays.origins=tensor([[6., 6., 6.],
        [6., 6., 6.],
        [6., 6., 6.],
        ...,
        [6., 6., 6.],
        [6., 6., 6.],
        [6., 6., 6.]], device='cuda:0')
((rays.origins < -1) | (1 < rays.origins)).all(dim=1)=([True, True, True,  ..., True, True, True], device='cuda:0')
ridx=tensor([], device='cuda:0', dtype=torch.int32)
pidx=tensor([], device='cuda:0', dtype=torch.int32)
depth=tensor([], device='cuda:0', size=(0, 2))
non_zero_elements=tensor([], device='cuda:0', dtype=torch.bool)
filtered_ridx.shape=torch.Size([0])
filtered_depth.shape=torch.Size([0, 2])
insum.shape=torch.Size([0])
Traceback (most recent call last):
  File "/home/atsushi/workspace/wisp211/app/nerf/main_nerf.py", line 131, in <module>
    app.run()  # Run in interactive mode
  File "/home/atsushi/workspace/wisp211/wisp/renderer/app/wisp_app.py", line 267, in run
    app.run()   # App clock should always run as frequently as possible (background tasks should not be limited)
...
other traceback lines
...
  File "/home/atsushi/workspace/wisp211/wisp/tracers/base_tracer.py", line 161, in forward
    rb = self.trace(nef, rays, requested_channels, requested_extra_channels, **input_args)
  File "/home/atsushi/workspace/wisp211/wisp/tracers/packed_rf_tracer.py", line 117, in trace
    raymarch_results = nef.grid.raymarch(rays,
  File "/home/atsushi/workspace/wisp211/wisp/models/grids/hash_grid.py", line 240, in raymarch
    return self.blas.raymarch(rays, raymarch_type=raymarch_type, num_samples=num_samples, level=self.blas.max_level)
  File "/home/atsushi/workspace/wisp211/wisp/accelstructs/octree_as.py", line 436, in raymarch
    raymarch_results = self._raymarch_uniform(rays=rays, num_samples=num_samples, level=level)
  File "/home/atsushi/workspace/wisp211/wisp/accelstructs/octree_as.py", line 365, in _raymarch_uniform
    results = wisp_C.ops.uniform_sample_cuda(scale, filtered_ridx.contiguous(), filtered_depth, insum)
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I looked into _raymarch_uniform and found out that uniform_sample_cuda fails when spc_render.unbatched_raytrace returns empty tensors for ridx, pidx, and depth, as you can see in the earlier half of the error message. I also confirmed that ridx, pidx, and depth can also be empty with torch1.13.1 and cuda117 while I did not experience that error. Besides, I faced the error with torch1.13.1 and cuda118. So, I believe that uniform_sample_cuda's behaviour differs between cuda117 and later versions. If I had an experience in Cuda coding, I could debug the method. But I do not know how to code Cuda right now. So, does anybody debug it?

Thanks in advance!

I have looked into the issue and got some findings.

This error happens when you run the app with the following setting:

PyTorch Version: 2.*
traicer.raymarch-type: uniform
interactive: True

\ The code section terminating the app is https://github.com/NVIDIAGameWorks/kaolin-wisp/blob/931707e50f1511fdb4af55eeb4aed4df23b7c2b1/wisp/csrc/ops/uniform_sample_cuda.cu#L95

While investigating the code, I noticed that cudaGetLastError sometimes returns a non-zero enum value, which is mainly 9, meaning cudaErrorInvalidConfiguration, then the AT_CUDA_CHECK triggers the termination. A new finding is that the invalid configuration is actually raised even with cuda 11.7. Fortunate or not, AT_CUDA_CHECK, which is actually a wrapper of C10_CUDA_CHECK, does not handle the error code properly and does not terminate the app in PyTorch 1. even if the error value is given. However, C10_CUDA_CHECK has been implemented differently since PyTorch 2. and started to terminate the app.

I also run the app without the interactive viewer. Then I noticed the code runs completely in this case. Combined with the above situation, maybe something goes wrong on the interactive viewer side, and the error is caught by the AT_CUDA_CHECK in uniform_sample_cuda.cu.

As a workaround, AT_CUDA_CHECK can be commented out. As the case with PyTorch 1.*, the app works even though I am not sure it is a good situation.

I may further investigate the issue, but I have no experience with GUI app coding at all. So, if someone joins in solving this issue, I would really appreciate it.

NVIDIAGameWorks / kaolin-wisp

`uniform_sample_cuda` fails when `spc_render.unbatched_raytrace` return empty tensors for `ridx`, `pidx`, and `depth` #192