NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.29k stars 139 forks source link

An error occurs when the for loop runs the task multiple times : Cuda error #139

Closed YuJinB123 closed 9 months ago

YuJinB123 commented 9 months ago

I encountered a problem when running multiple tasks using a for loop. At some task, the following error occurred. I don’t know how to locate this problem, this problem is randomly generated:

rast, _ = dr.rasterize(self.glctx, uv.unsqueeze(0), ft, (h, w)) # [1, h, w, 4] File "/usr/local/lib/python3.10/dist-packages/nvdiffrast/torch/ops.py", line 310, in rasterize return _rasterize_func.apply(glctx, pos, tri, resolution, ranges, grad_db, -1) File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/usr/local/lib/python3.10/dist-packages/nvdiffrast/torch/ops.py", line 246, in forward out, out_db = _get_plugin(gl=True).rasterize_fwd_gl(raster_ctx.cpp_wrapper, pos, tri, resolution, ranges, peeling_idx)

terminate called after throwing an instance of 'c10::Error' what(): Cuda error: 709[cudaGraphicsUnregisterResource(s.cudaColorBuffer[i]);] Exception raised from rasterizeReleaseBuffers at /usr/local/lib/python3.10/dist-packages/nvdiffrast/common/rasterize_gl.cpp:620 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f8486ea34d7 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::string const&) + 0x64 (0x7f8486e6d36b in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) frame #2: rasterizeReleaseBuffers(int, RasterizeGLState&) + 0x25e (0x7f832e29cead in /root/.cache/torch_extensions/py310_cu118/nvdiffrast_plugin_gl/nvdiffrast_plugin_gl.so) frame #3: RasterizeGLStateWrapper::~RasterizeGLStateWrapper() + 0x37 (0x7f832e2b5b87 in /root/.cache/torch_extensions/py310_cu118/nvdiffrast_plugin_gl/nvdiffrast_plugin_gl.so) frame #4: std::default_delete::operator()(RasterizeGLStateWrapper*) const + 0x26 (0x7f832e2ae844 in /root/.cache/torch_extensions/py310_cu118/nvdiffrast_plugin_gl/nvdiffrast_plugin_gl.so) frame #5: std::unique_ptr<RasterizeGLStateWrapper, std::default_delete >::~unique_ptr() + 0x56 (0x7f832e2ac82c in /root/.cache/torch_extensions/py310_cu118/nvdiffrast_plugin_gl/nvdiffrast_plugin_gl.so) frame #6: + 0xbb3bb (0x7f832e2ab3bb in /root/.cache/torch_extensions/py310_cu118/nvdiffrast_plugin_gl/nvdiffrast_plugin_gl.so) frame #7: + 0x3bd6fb (0x7f84806606fb in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so) frame #8: + 0x3be66f (0x7f848066166f in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)

frame #31: + 0x29d90 (0x7f855c92ed90 in /usr/lib/x86_64-linux-gnu/libc.so.6) frame #32: __libc_start_main + 0x80 (0x7f855c92ee40 in /usr/lib/x86_64-linux-gnu/libc.so.6)
YuJinB123 commented 9 months ago

My code : self.glctx = dr.RasterizeGLContext(output_db=False)

s-laine commented 9 months ago

It looks like you're creating and destroying the rasterizer context at every iteration, which is not the intended way to do it. Your system probably runs out of some internal OpenGL resource and that's what eventually causes the crash.

Try creating one dr.RasterizeGLContext and using that same context throughout the program. I'm pretty sure that will fix the problem.

YuJinB123 commented 9 months ago

That solved my problem. Thank you very much.