NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.43k stars 158 forks source link

Unable to install nvdiffrast on GeForce RTX 3090. #56

Closed zoe2718 closed 2 years ago

zoe2718 commented 3 years ago

Environment:

1638178319(1)

cuda 11.2

I have tried pytorch:1.8.0-cuda11.1-cudnn8 and pytorch:1.7.1-cuda11.0-cudnn8, but both are failed. I use the provided Dockerfile with only pytorch and cuda version changed. The command _bash ./runsample.sh --build-container (or docker build -f docker/Dockerfile -t name:tagname .) can be executed successfully, but after that the nvdiffrast is still not installed (when import nvdiffrast.torch, raise ModuleNotFoundError: No module named 'nvdiffrast.torch').

I have successfully installed nvdiffrast with the same steps on 2080ti GPU+cuda10.2, but failed on 3090 GPU+cuda11.2. Is there anyone know how to install nvdiffrast on 3090 GPU? Thanks.

zoe2718 commented 3 years ago

Here is the command line output when running ./run_sample.sh --build-container:

Sending build context to Docker daemon 11.36MB Step 1/14 : ARG BASE_IMAGE=pytorch/pytorch:1.7.1-cuda11.0-cudnn8-devel Step 2/14 : FROM $BASE_IMAGE ---> 7554ac65eba5 Step 3/14 : RUN apt-get update && apt-get install -y --no-install-recommends pkg-config libglvnd0 libgl1 libglx0 libegl1 libgles2 libglvnd-dev libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev cmake curl ---> Using cache ---> 2021ade4a5c8 Step 4/14 : ENV PYTHONDONTWRITEBYTECODE=1 ---> Using cache ---> ca9221b6d071 Step 5/14 : ENV PYTHONUNBUFFERED=1 ---> Using cache ---> ec3e675141ce Step 6/14 : ENV LD_LIBRARY_PATH /usr/lib64:$LD_LIBRARY_PATH ---> Using cache ---> 54956580fb6d Step 7/14 : ENV NVIDIA_VISIBLE_DEVICES all ---> Using cache ---> 400b43470c33 Step 8/14 : ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,graphics ---> Using cache ---> 1bead4e2f9e5 Step 9/14 : ENV PYOPENGL_PLATFORM egl ---> Using cache ---> 2ac6364927ab Step 10/14 : COPY docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json ---> Using cache ---> f7bcbcf17535 Step 11/14 : RUN pip install ninja imageio imageio-ffmpeg ---> Running in 40f85408f32e Collecting ninja Downloading ninja-1.10.2.3-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) Collecting imageio Downloading imageio-2.13.0-py3-none-any.whl (3.3 MB) Collecting imageio-ffmpeg Downloading imageio_ffmpeg-0.4.5-py3-none-manylinux2010_x86_64.whl (26.9 MB) Collecting pillow>=8.3.2 Downloading Pillow-8.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB) Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from imageio) (1.19.2) Installing collected packages: ninja, pillow, imageio, imageio-ffmpeg Attempting uninstall: pillow Found existing installation: Pillow 8.1.0 Uninstalling Pillow-8.1.0: Successfully uninstalled Pillow-8.1.0 Successfully installed imageio-2.13.0 imageio-ffmpeg-0.4.5 ninja-1.10.2.3 pillow-8.4.0 Removing intermediate container 40f85408f32e ---> 7d9a9bf8ff95 Step 12/14 : COPY nvdiffrast /tmp/pip/nvdiffrast/ ---> e07b7ff78278 Step 13/14 : COPY README.md setup.py /tmp/pip/ ---> a985488b664c Step 14/14 : RUN cd /tmp/pip && pip install . ---> Running in bafb10e7d7e5 Processing /tmp/pip Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from nvdiffrast==0.2.7) (1.19.2) Building wheels for collected packages: nvdiffrast Building wheel for nvdiffrast (setup.py): started Building wheel for nvdiffrast (setup.py): finished with status 'done' Created wheel for nvdiffrast: filename=nvdiffrast-0.2.7-py3-none-any.whl size=92264 sha256=bcb73fab8d4628893c608442ab57cde8fc5cddd963469b2b545bba14b5533e71 Stored in directory: /tmp/pip-ephem-wheel-cache-vkr9plma/wheels/3c/e6/6c/927e643f0816c802008017bea0b43743b6e13629535e616820 Successfully built nvdiffrast Installing collected packages: nvdiffrast Successfully installed nvdiffrast-0.2.7 Removing intermediate container bafb10e7d7e5 ---> e5fd4265b280 Successfully built e5fd4265b280 Successfully tagged name:tagname

No python sample given or file '' not found. Exiting.

zoe2718 commented 3 years ago

If I use the command pip install ., nvdiffrast can be installed, but an error will be reported when executing glctx = dr.RasterizeGLContext(device=device):

Traceback (most recent call last): File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1673, in _run_ninja_build env=env) File "/home/wsj/.conda/envs/cu111/lib/python3.7/subprocess.py", line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "uv_test_nvdiffrast.py", line 66, in uv_ops = UVOperation(uv_size, facemodel, device, batch_size=1) File "/home/wsj/code/Deep3DFaceRecon/uvuv.py", line 108, in init self.glctx = dr.RasterizeGLContext(device=device) File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/ops.py", line 160, in init self.cpp_wrapper = _get_plugin().RasterizeGLStateWrapper(output_db, mode == 'automatic', cuda_device_idx) File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/ops.py", line 84, in _get_plugin torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts, extra_ldflags=ldflags, with_cuda=True, verbose=False) File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1091, in load keep_intermediates=keep_intermediates) File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1302, in _jit_compile is_standalone=is_standalone) File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1407, in _write_ninja_file_and_build_library error_prefix=f"Error building extension '{name}'") File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1683, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'nvdiffrast_plugin': [1/4] c++ -MMD -MF torch_rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-11.2/include -isystem /home/wsj/.conda/envs/cu111/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/torch_rasterize.cpp -o torch_rasterize.o FAILED: torch_rasterize.o c++ -MMD -MF torch_rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-11.2/include -isystem /home/wsj/.conda/envs/cu111/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/torch_rasterize.cpp -o torch_rasterize.o In file included from /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/../common/rasterize.h:42:0, from /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/torch_rasterize.cpp:12: /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/../common/glutil.h:36:10: fatal error: EGL/egl.h: No such file or directory

include <EGL/egl.h>

      ^~~~~~~~~~~

compilation terminated. [2/4] c++ -MMD -MF glutil.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-11.2/include -isystem /home/wsj/.conda/envs/cu111/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/glutil.cpp -o glutil.o FAILED: glutil.o c++ -MMD -MF glutil.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-11.2/include -isystem /home/wsj/.conda/envs/cu111/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/glutil.cpp -o glutil.o In file included from /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/glutil.cpp:14:0: /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/glutil.h:36:10: fatal error: EGL/egl.h: No such file or directory

include <EGL/egl.h>

      ^~~~~~~~~~~

compilation terminated. [3/4] c++ -MMD -MF rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-11.2/include -isystem /home/wsj/.conda/envs/cu111/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/rasterize.cpp -o rasterize.o FAILED: rasterize.o c++ -MMD -MF rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-11.2/include -isystem /home/wsj/.conda/envs/cu111/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/rasterize.cpp -o rasterize.o In file included from /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/rasterize.h:42:0, from /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/rasterize.cpp:9: /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/glutil.h:36:10: fatal error: EGL/egl.h: No such file or directory

include <EGL/egl.h>

      ^~~~~~~~~~~

compilation terminated. ninja: build stopped: subcommand failed.

s-laine commented 3 years ago

Target architecture compute_86 that is native to RTX 3090 is only supported in Cuda 11.2 and later, so that is probably why the modified Dockerfiles with Cuda 11.0/11.1 fail. This is because PyTorch's C++/Cuda plugin builder always targets the native architecture of the GPU installed in the system, no matter if the available Cuda toolkit supports it or not, and if it doesn't, the compilation fails. Cuda 11.2 in your host environment should work, but apparently you don't have EGL properly installed there, given that header file EGL/egl.h is not found by the compiler.

I think you have three options here:

The best option is to 1) Use our latest Dockerfile as-is. It uses a base image with up-to-date PyTorch and Cuda versions that support the latest GPUs.

The second option is to 2) Install EGL in your host environment. You should use our Dockerfile as a reference on how to do that, as it is not trivial to get a working setup.

If everything else fails, you can also 3) Modify the plugin compilation function _get_plugin() in nvdiffrast/torch/ops.py to force an older target architecture for NVCC. You can do this by setting, e.g., os.environ['TORCH_CUDA_ARCH_LIST'] = '8.0' on line 71. However, sticking to old versions of tools is generally not great — you may run into compatibility issues, and you may not get the best possible performance out of your hardware.

s-laine commented 3 years ago

On the other hand, it appears that since version 1.8.0, PyTorch attempts to clamp the architecture to what the installed Cuda toolkit supports (as seen here). Therefore PyTorch 1.8.0 with Cuda 11.1 should in theory work, compiling to architecture compute_80.

So either that clamping logic fails somehow, or there is some other issue preventing the compilation from succeeding. Setting verbose=True on line 84 of nvdiffrast/torch/ops.py should make compilation errors visible which may help in diagnosing the problem.

zoe2718 commented 3 years ago

Thanks for your quick response.

I update the cuda driver and install cuda 11.3.

1638361596(1)

Then use the provided Dockerfile to build a image.

Because the GPUs of our server has been set user groups and permissions, I cannot run ./run_sample.sh ./samples/torch/cube.py --resolution 32 directly (which will raise RuntimeError: No CUDA GPUs are available), I run a docker container using the command docker run --privileged -it --gpus all --pid=host -v /home/:/home/ "gltorch:latest" /bin/bash, and run python cube.py --resolution 32 inside the container. However, there is an error:

No output directory specified, not saving log or images Mesh has 12 triangles and 8 vertices. Using /root/.cache/torch_extensions/py37_cu113 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py37_cu113/nvdiffrast_plugin/build.ninja... Building extension module nvdiffrast_plugin... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/14] c++ -MMD -MF common.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/common.cpp -o common.o [2/14] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/rasterize.cu -o rasterize.cuda.o [3/14] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/antialias.cu -o antialias.cuda.o [4/14] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/interpolate.cu -o interpolate.cuda.o [5/14] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/texture.cu -o texture.cuda.o [6/14] c++ -MMD -MF texture.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/texture.cpp -o texture.o [7/14] c++ -MMD -MF rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/rasterize.cpp -o rasterize.o [8/14] c++ -MMD -MF torch_rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/torch/torch_rasterize.cpp -o torch_rasterize.o [9/14] c++ -MMD -MF torch_antialias.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/torch/torch_antialias.cpp -o torch_antialias.o [10/14] c++ -MMD -MF glutil.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/glutil.cpp -o glutil.o [11/14] c++ -MMD -MF torch_texture.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/torch/torch_texture.cpp -o torch_texture.o [12/14] c++ -MMD -MF torch_interpolate.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/torch/torch_interpolate.cpp -o torch_interpolate.o [13/14] c++ -MMD -MF torch_bindings.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/torch/torch_bindings.cpp -o torch_bindings.o [14/14] c++ common.o glutil.o rasterize.cuda.o rasterize.o interpolate.cuda.o texture.cuda.o texture.o antialias.cuda.o torch_bindings.o torch_rasterize.o torch_interpolate.o torch_texture.o torch_antialias.o -shared -lGL -lEGL -L/opt/conda/lib/python3.7/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o nvdiffrast_plugin.so Loading extension module nvdiffrast_plugin... [F glutil.cpp:338] eglInitialize() failed Aborted (core dumped)

s-laine commented 3 years ago

Nvdiffrast requires an OpenGL device for executing the rasterization op, and EGL is required for to get an OpenGL context, i.e., to get access to the graphics pipeline of the GPU. The EGL initialization failure suggests that the OpenGL configuration is somehow not functional in your cluster environment. This could perhaps be an issue with permissions, but I don't think that should result in EGL initialization failure. Thus it's probably related to some other part of the cluster configuration, and likely not something you can fix without going through the cluster management. Maybe there are some OS-level Nvidia drivers missing in the cluster machine?

zoe2718 commented 3 years ago

This is indeed caused by the nvidia driver. The nvidia driver was installed with argument -no-opengl-files before. I reinstall the nvidia driver without -no-opengl-files and all problems are gone.