NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.29k stars 139 forks source link

Crash in call to texture with mipmap #129

Closed sweeneychris closed 9 months ago

sweeneychris commented 11 months ago

I am trying to create a mipmap and sample it with a set of input u,v,level using the texture() call. This is being done while running a 3d web visualizer (viser which is part of the NerfStudio project. When I use the viewer at the same time as training my model that calls texture(), I immediately get a crash. If I do not use the 3d web viewer, then training can complete perfectly fine. I believe it may have something to do with conflicting OpenGL context but I don't see a way to access the gl context for texture/mipmaps. Here is the stack trace. Could you advise how we might solve this issue?

Traceback (most recent call last): File "/usr/lib64/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/sweeneychris/projects/nerfstudio/nerfstudio/viewer/server/render_state_machine.py", line 173, in run outputs = self._render_img(action.cam_msg) File "/home/sweeneychris/projects/nerfstudio/nerfstudio/viewer/server/render_state_machine.py", line 148, in _render_img outputs = self.viewer.get_model().get_outputs_for_camera_ray_bundle(camera_ray_bundle) File "/home/sweeneychris/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/sweeneychris/projects/nerfstudio/nerfstudio/models/base_model.py", line 179, in get_outputs_for_camera_ray_bundle outputs = self.forward(ray_bundle=ray_bundle) File "/home/sweeneychris/projects/nerfstudio/nerfstudio/models/base_model.py", line 142, in forward return self.get_outputs(ray_bundle) File "/home/sweeneychris/projects/trimiprf_nerfstudio/trimiprf/trimiprf.py", line 297, in get_outputs ray_samples, weights_list, ray_samples_list = self.proposal_sampler( File "/home/sweeneychris/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/sweeneychris/projects/nerfstudio/nerfstudio/model_components/ray_samplers.py", line 50, in forward return self.generate_ray_samples(args, kwargs) File "/home/sweeneychris/projects/nerfstudio/nerfstudio/model_components/ray_samplers.py", line 602, in generate_ray_samples density = density_fns[i_level](https://github.com/nerfstudio-project/nerfstudio/issues/ray_samples.frustums.get_positions()) File "/home/sweeneychris/projects/trimiprf_nerfstudio/trimiprf/trimip_field.py", line 369, in densityfn density, = self.get_density(ray_samples) File "/home/sweeneychris/projects/trimiprf_nerfstudio/trimiprf/trimip_field.py", line 391, in get_density features = interpolate_ms_features( File "/home/sweeneychris/projects/trimiprf_nerfstudio/trimiprf/trimip_field.py", line 63, in interpolate_ms_features grid_features = grid(pts) File "/home/sweeneychris/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/sweeneychris/projects/trimiprf_nerfstudio/trimiprf/trimip_encoding.py", line 74, in forward plane_features = texture(self.plane_coef, plane_coord, File "/home/sweeneychris/.local/lib/python3.10/site-packages/nvdiffrast/torch/ops.py", line 615, in texture return _texture_func.apply(filter_mode, tex, uv, filter_mode_enum, boundary_mode_enum) File "/home/sweeneychris/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, **kwargs) # type: ignore[misc] File "/home/sweeneychris/.local/lib/python3.10/site-packages/nvdiffrast/torch/ops.py", line 504, in forward out = _get_plugin().texture_fwd(tex, uv, filter_mode_enum, boundary_mode_enum) RuntimeError: Cuda error: 9[cudaLaunchKernel(func_tbl[func_idx], gridSize, blockSize, args, 0, stream);]

s-laine commented 11 months ago

The texture op is implemented fully in Cuda and it doesn't use the OpenGL context. The only op that does so is rasterize(), and only if you're using RasterizeGLContext and not RasterizeCudaContext. This makes me think the crash probably isn't related to OpenGL but Cuda.

First try running the code with environment variable CUDA_LAUNCH_BLOCKING=1 set. This makes each launch wait until completion, so that you can be certain which launch causes the crash. The thinking below applies to any of the ops, but I'm assuming the texturing op here.

Cuda error code 9 corresponds to cudaErrorInvalidConfiguration, indicating that a kernel launch requires more resources than the device has, or the launch specification is otherwise somehow invalid. Hence there is something wrong with either the launch or the device.

You should double-check that you're executing the texturing op with similar parameters in both scenarios (working vs. crashing). It is possible that something like a zero dimension in some of the inputs could cause a launch with zero threads, which I believe can cause this error. The same goes for extremely large input tensors. There could also be an exotic combination of filtering/boundary modes, etc., that somehow trips up the kernel selection logic, but these would need to be different to explain why there's a crash in one scenario and not in the other.

Another possibility is that you're running the op on a different Cuda device in the different scenarios. Maybe your setup has a beefy discrete GPU and a stripped-down integrated GPU, and you end up on the latter if the viewer is enabled. The device used is chosen based on where the tex input tensor is located.

Actually, do you know if your code manages to run any Cuda kernel successfully before this crash? Executing rasterize() with a RasterizeCudaContext or executing interpolate() do Cuda launches, and if your code seems to run those successfully, you should check if the outputs there look correct. Otherwise, it might be that something in the viewer-enabled setup breaks Cuda competely.

Finally, I suppose it is possible that if the GPU memory is filled to the brim by something else, it might cause a launch failure. You'd have to hit the limit very accurately, though, to not have any spare room for the launch data.

If nothing seems to help, updating to the latest Cuda toolkit and GPU driver is always a good idea, if you haven't done so already.