I am trying to switch my env from torch1.13.1 and cuda117 to torch.2.1.1 and cuda121. After installation, I trained a nerf with --tracer.raymarch-type uniform but it failed with an error message like below:
[i] Using PYGLFW_IMGUI (GL 3.3)
2024-06-29 01:08:12,092| INFO| [i] Using PYGLFW_IMGUI (GL 3.3)
[i] Running at 60 frames/second
2024-06-29 01:08:12,110| INFO| [i] Running at 60 frames/second
rays.origins=tensor([[6., 6., 6.],
[6., 6., 6.],
[6., 6., 6.],
...,
[6., 6., 6.],
[6., 6., 6.],
[6., 6., 6.]], device='cuda:0')
((rays.origins < -1) | (1 < rays.origins)).all(dim=1)=([True, True, True, ..., True, True, True], device='cuda:0')
ridx=tensor([], device='cuda:0', dtype=torch.int32)
pidx=tensor([], device='cuda:0', dtype=torch.int32)
depth=tensor([], device='cuda:0', size=(0, 2))
non_zero_elements=tensor([], device='cuda:0', dtype=torch.bool)
filtered_ridx.shape=torch.Size([0])
filtered_depth.shape=torch.Size([0, 2])
insum.shape=torch.Size([0])
Traceback (most recent call last):
File "/home/atsushi/workspace/wisp211/app/nerf/main_nerf.py", line 131, in <module>
app.run() # Run in interactive mode
File "/home/atsushi/workspace/wisp211/wisp/renderer/app/wisp_app.py", line 267, in run
app.run() # App clock should always run as frequently as possible (background tasks should not be limited)
...
other traceback lines
...
File "/home/atsushi/workspace/wisp211/wisp/tracers/base_tracer.py", line 161, in forward
rb = self.trace(nef, rays, requested_channels, requested_extra_channels, **input_args)
File "/home/atsushi/workspace/wisp211/wisp/tracers/packed_rf_tracer.py", line 117, in trace
raymarch_results = nef.grid.raymarch(rays,
File "/home/atsushi/workspace/wisp211/wisp/models/grids/hash_grid.py", line 240, in raymarch
return self.blas.raymarch(rays, raymarch_type=raymarch_type, num_samples=num_samples, level=self.blas.max_level)
File "/home/atsushi/workspace/wisp211/wisp/accelstructs/octree_as.py", line 436, in raymarch
raymarch_results = self._raymarch_uniform(rays=rays, num_samples=num_samples, level=level)
File "/home/atsushi/workspace/wisp211/wisp/accelstructs/octree_as.py", line 365, in _raymarch_uniform
results = wisp_C.ops.uniform_sample_cuda(scale, filtered_ridx.contiguous(), filtered_depth, insum)
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I looked into _raymarch_uniform and found out that uniform_sample_cuda fails when spc_render.unbatched_raytrace returns empty tensors for ridx, pidx, and depth, as you can see in the earlier half of the error message. I also confirmed that ridx, pidx, and depth can also be empty with torch1.13.1 and cuda117 while I did not experience that error. Besides, I faced the error with torch1.13.1 and cuda118. So, I believe that uniform_sample_cuda's behaviour differs between cuda117 and later versions. If I had an experience in Cuda coding, I could debug the method. But I do not know how to code Cuda right now. So, does anybody debug it?
Hi, everyone
I am trying to switch my env from
torch1.13.1
andcuda117
totorch.2.1.1
andcuda121
. After installation, I trained a nerf with--tracer.raymarch-type uniform
but it failed with an error message like below:I looked into
_raymarch_uniform
and found out thatuniform_sample_cuda
fails whenspc_render.unbatched_raytrace
returns empty tensors forridx
,pidx
, anddepth
, as you can see in the earlier half of the error message. I also confirmed thatridx
,pidx
, anddepth
can also be empty withtorch1.13.1
andcuda117
while I did not experience that error. Besides, I faced the error withtorch1.13.1
andcuda118
. So, I believe thatuniform_sample_cuda
's behaviour differs betweencuda117
and later versions. If I had an experience in Cuda coding, I could debug the method. But I do not know how to code Cuda right now. So, does anybody debug it?Thanks in advance!