NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.29k stars 139 forks source link

Segmentation fault in RasterizeCudaContext extension compile #138

Closed avaer closed 7 months ago

avaer commented 9 months ago

I'm trying to run the samples on my H100 server (using cuda cu118).

This crashes on nvcc:

$ python3 samples/torch/triangle.py --cuda
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "/home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/ai-cluster/dreamgaussian/nvdiffrast/samples/torch/triangle.py", line 19, in <module>
    glctx = dr.RasterizeCudaContext()
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/nvdiffrast/torch/ops.py", line 177, in __init__
    self.cpp_wrapper = _get_plugin().RasterizeCRStateWrapper(cuda_device_idx)
                       ^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/nvdiffrast/torch/ops.py", line 118, in _get_plugin
    torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts+['-lineinfo'], extra_ldflags=ldflags, with_cuda=True, verbose=False)
  File "/home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'nvdiffrast_plugin': [1/5] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/include -isystem /home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/include/TH -isystem /home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/include/THC -isystem /home/ubuntu/miniconda3/envs/dreamgaussian/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -DNVDR_TORCH -lineinfo -std=c++17 -c /home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/nvdiffrast/common/rasterize.cu -o rasterize.cuda.o 
FAILED: rasterize.cuda.o 
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/include -isystem /home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/include/TH -isystem /home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/torch/include/THC -isystem /home/ubuntu/miniconda3/envs/dreamgaussian/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -DNVDR_TORCH -lineinfo -std=c++17 -c /home/ubuntu/miniconda3/envs/dreamgaussian/lib/python3.11/site-packages/nvdiffrast/common/rasterize.cu -o rasterize.cuda.o 
Segmentation fault (core dumped)

My nvcc version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

I am using the following pytorch:

pip install torch==2.0.1+cu118 torchaudio==2.0.2+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118

nvidia-smi:

| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |

cudatoolkit:

cudatoolkit               11.8.0               h6a678d5_0  

I tried clearing the cache per https://github.com/NVlabs/nvdiffrast/issues/76#issuecomment-1142009426 and that didn't change the result.

s-laine commented 9 months ago

I cannot think of any other remedy than to try upgrading the CUDA toolkit to get a newer version of nvcc. It obviously shouldn't be possible for any sort of source code to make the compiler crash.

There have been reports of nvcc crashes such as this and this, although those are with a later version. It might be possible to fiddle with the compiler options to avoid the crash, but that's pretty much voodoo at this point.