Unsupported gpu architecture 'compute_86'

shubhajitbasak commented 2 years ago

I am trying to run the docker container in a GeForce RTX 3090 the command bash ./run_sample.sh ./samples/torch/cube.py --resolution 32 is giving me following error :

Using container image: gltorch:latest Running command: ./samples/torch/cube.py --resolution 32 No output directory specified, not saving log or images Mesh has 12 triangles and 8 vertices. Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1533, in _run_ninja_build subprocess.run( File "/opt/conda/lib/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "./samples/torch/cube.py", line 200, in main() File "./samples/torch/cube.py", line 182, in main fit_cube( File "./samples/torch/cube.py", line 76, in fit_cube glctx = dr.RasterizeGLContext() File "/opt/conda/lib/python3.8/site-packages/nvdiffrast/torch/ops.py", line 151, in init self.cpp_wrapper = _get_plugin().RasterizeGLStateWrapper(output_db, mode == 'automatic', cuda_device_idx) File "/opt/conda/lib/python3.8/site-packages/nvdiffrast/torch/ops.py", line 84, in _get_plugin torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts, extra_ldflags=ldflags, with_cuda=True, verbose=False) File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 986, in load return _jit_compile( File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1193, in _jit_compile _write_ninja_file_and_build_library( File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1297, in _write_ninja_file_and_build_library _run_ninja_build( File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1555, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'nvdiffrast_plugin': [1/14] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/common/rasterize.cu -o rasterize.cuda.o FAILED: rasterize.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/common/rasterize.cu -o rasterize.cuda.o nvcc fatal : Unsupported gpu architecture 'compute_86' [2/14] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/common/interpolate.cu -o interpolate.cuda.o FAILED: interpolate.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/common/interpolate.cu -o interpolate.cuda.o nvcc fatal : Unsupported gpu architecture 'compute_86' [3/14] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/common/texture.cu -o texture.cuda.o FAILED: texture.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/common/texture.cu -o texture.cuda.o nvcc fatal : Unsupported gpu architecture 'compute_86' [4/14] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/common/antialias.cu -o antialias.cuda.o FAILED: antialias.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/common/antialias.cu -o antialias.cuda.o nvcc fatal : Unsupported gpu architecture 'compute_86' [5/14] c++ -MMD -MF common.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/common/common.cpp -o common.o [6/14] c++ -MMD -MF texture.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/common/texture.cpp -o texture.o [7/14] c++ -MMD -MF glutil.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/common/glutil.cpp -o glutil.o [8/14] c++ -MMD -MF torch_rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/torch/torch_rasterize.cpp -o torch_rasterize.o [9/14] c++ -MMD -MF torch_texture.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/torch/torch_texture.cpp -o torch_texture.o [10/14] c++ -MMD -MF torch_interpolate.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/torch/torch_interpolate.cpp -o torch_interpolate.o [11/14] c++ -MMD -MF rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/common/rasterize.cpp -o rasterize.o [12/14] c++ -MMD -MF torch_antialias.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/torch/torch_antialias.cpp -o torch_antialias.o [13/14] c++ -MMD -MF torch_bindings.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.8/site-packages/nvdiffrast/torch/torch_bindings.cpp -o torch_bindings.o ninja: build stopped: subcommand failed.

Here is my nvidia-smi output -

+-----------------------------------------------------------------------------+ ``

``

s-laine commented 2 years ago

Your Cuda version seems recent enough to support compute_86, but perhaps there is something wrong about the installation and compiler binaries from an older Cuda version are used for some reason. A similar problem was seen here, and the last couple of messages suggests this can happen due to missing a step when updating Cuda.

Probably not related to this, but a few people have reported that running the examples didn't work until the machine was rebooted after updating the GPU drivers. But that caused a crash a bit later in the code, not in the compilation phase as in your case.

shubhajitbasak commented 2 years ago

Thanks @s-laine for your input, it was with the cuda inside the docker as I am using 3090 I change the base image in the Dockerfile to pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel and it build successfully. It seems previously it is pointing to cuda 10 ( I figured it out by running nvcc -V from docker container) which is not compatible to x_86 architecture.

In another server I have NVIDIA RTX A6000, below is the nvidia-smi output -

Here following the same I am receiving the error -

[F glutil.cpp:338] eglInitialize() failed Aborted (core dumped)

The nvcc -V output with in docker -

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Mon_May__3_19:15:13_PDT_2021 Cuda compilation tools, release 11.3, V11.3.109 Build cuda_11.3.r11.3/compiler.29920130_0

The only difference is this server don't have cuda installed in it. Do you think to run we need to install cuda in the host server as well?

s-laine commented 2 years ago

I suppose you meant the Cuda version in the container was 11.0, not 10? Anyway, this indicates that we should update our Dockerfile with a more recent base image to support the Ampere GPUs.

As a side note, compiling the nvdiffrast plugin to a previous architecture, e.g., compute_80 would have been completely fine, and it's a bit of a shame that PyTorch's plugin build system doesn't take into account which architectures are supported by the Cuda toolkit installed in the system.

The failure of eglInitialize() indicates that nvdiffrast was not able to initialize the operating system -provided EGL graphics library needed to get a rendering context for the rasterization op. This has most likely nothing to do with Cuda, but with the graphics drivers installed in the system. Has the system been rebooted since installing the graphics drivers? Are you using a Docker container built on that specific system? We haven't personally worked with setups having Quadro boards in a Linux machine, so I don't know if there are specific quirks related to their graphics drivers, but I wouldn't expect any.

shubhajitbasak commented 2 years ago

resolved the issue in the A6000 server after installing libnvidia-gl package, not sure why it was missing in the original install.

s-laine commented 2 years ago

Version 0.2.7 now has Dockerfile base image upgraded to pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel, adding Cuda compiler support for Ampere GPUs. Closing.

NVlabs / nvdiffrast

Unsupported gpu architecture 'compute_86' #51