NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.37k stars 146 forks source link

*** stack smashing detected ***: terminated #181

Closed manhdan226 closed 4 months ago

manhdan226 commented 4 months ago

Hi @s-laine and everyone,

When I ran with RasterizeCudaContext on WSL2 Ubuntu 22.04, I got this error. stack smashing detected : terminated

I think it was caused by this line self.cpp_wrapper = _get_plugin().RasterizeCRStateWrapper(cuda_device_idx). But I don't have any ideas to solve this. Can you help me with this case?

======================= torch 1-13-1 + cuda 11.6 tensorflow 2.8 tensorflow-gpu 2.8

Thank you so much!

s-laine commented 4 months ago

This is something I haven't seen before, and it could be a number of things, as this is most likely the line where the nvdiffrast c++ extension should be compiled and loaded for the first time, and a small bit of its code is executed.

In nvdiffrast code that this line should execute, I cannot see anything that could even theoretically corrupt the stack frame, which leads me to believe it might be related to WSL2 environment, missing or outdated libraries (Cuda, GPU drivers, etc.), the plugin compiler, or possibly even pytorch's c++ extension wrapper. Most likely it's some sort of compatibility issue.

If you can run Python under GDB or get a c++ stack trace of the crash via some other way, that might tell what function is to blame and in which library it is. I cannot help with details on how to do this, though, as I'm not familiar with Linux debuggers.

To see if the plugin even compiles successfully, you can try setting verbose=True in nvdiffrast/torch/ops.py line 118 and clearing the compilation cache directory before running again. To see where the cache directory is located, start Python and see what

import torch.utils.cpp_extension
print(torch.utils.cpp_extension._get_build_directory('nvdiffrast_plugin', False))

says.

manhdan226 commented 4 months ago

@s-laine I tried to run it on another machine with native Ubuntu 20.04, not WSL and it worked. So we can close this topic. Thank you for your support.