NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.43k stars 158 forks source link

ninja build error when running examples #78

Closed jdagdelen closed 2 years ago

jdagdelen commented 2 years ago

I'm trying to debug why I'm getting an error at the ninja build stage for running the examples. I would appreciate hearing any ideas about what could be causing it.

System: Windows 10, RTX 3090, Visual Studio 2019

>>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:24:09_Pacific_Daylight_Time_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
>nvidia-smi
Sun May 29 12:32:09 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 516.01       Driver Version: 516.01       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0  On |                  N/A |
|  0%   35C    P8    19W / 350W |   1847MiB / 24576MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
Traceback (most recent call last):
  File "C:\Users\John Dagdelen\anaconda3\envs\dmodel\lib\site-packages\torch\utils\cpp_extension.py", line 1740, in _run_ninja_build
    subprocess.run(
  File "C:\Users\John Dagdelen\anaconda3\envs\dmodel\lib\subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\John Dagdelen\3D\InfiniteAssetLibrary\nvdiffrast\samples\torch\triangle.py", line 21, in <module>
    glctx = dr.RasterizeGLContext()
  File "C:\Users\John Dagdelen\anaconda3\envs\dmodel\lib\site-packages\nvdiffrast\torch\ops.py", line 160, in __init__
    self.cpp_wrapper = _get_plugin().RasterizeGLStateWrapper(output_db, mode == 'automatic', cuda_device_idx)
  File "C:\Users\John Dagdelen\anaconda3\envs\dmodel\lib\site-packages\nvdiffrast\torch\ops.py", line 84, in _get_plugin
    torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts, extra_ldflags=ldflags, with_cuda=True, verbose=False)
  File "C:\Users\John Dagdelen\anaconda3\envs\dmodel\lib\site-packages\torch\utils\cpp_extension.py", line 1144, in load
    return _jit_compile(
  File "C:\Users\John Dagdelen\anaconda3\envs\dmodel\lib\site-packages\torch\utils\cpp_extension.py", line 1357, in _jit_compile
    _write_ninja_file_and_build_library(
  File "C:\Users\John Dagdelen\anaconda3\envs\dmodel\lib\site-packages\torch\utils\cpp_extension.py", line 1469, in _write_ninja_file_and_build_library        
    _run_ninja_build(
  File "C:\Users\John Dagdelen\anaconda3\envs\dmodel\lib\site-packages\torch\utils\cpp_extension.py", line 1756, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'nvdiffrast_plugin': [1/1] "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30037\bin\Hostx64\x64/link.exe" common.o glutil.o rasterize.cuda.o rasterize.o interpolate.cuda.o texture.cuda.o texture.o antialias.cuda.o torch_bindings.o torch_rasterize.o torch_interpolate.o torch_texture.o torch_antialias.o /nologo /DLL "/LIBPATH:C:\Users\John Dagdelen\anaconda3\envs\dmodel\lib\site-packages\nvdiffrast\torch\..\lib" /DEFAULTLIB:gdi32 /DEFAULTLIB:opengl32 /DEFAULTLIB:user32 /DEFAULTLIB:setgpu c10.lib c10_cuda.lib torch_cpu.lib torch_cuda_cu.lib -INCLUDE:?_torch_cuda_cu_linker_symbol_op_cuda@native@at@@YA?AVTensor@2@AEBV32@@Z torch_cuda_cpp.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ torch.lib "/LIBPATH:C:\Users\John Dagdelen\anaconda3\envs\dmodel\lib\site-packages\torch\lib" torch_python.lib "/LIBPATH:C:\Users\John Dagdelen\anaconda3\envs\dmodel\libs" "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib/x64" cudart.lib /out:nvdiffrast_plugin.pyd
FAILED: nvdiffrast_plugin.pyd
"C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30037\bin\Hostx64\x64/link.exe" common.o glutil.o rasterize.cuda.o rasterize.o interpolate.cuda.o texture.cuda.o texture.o antialias.cuda.o torch_bindings.o torch_rasterize.o torch_interpolate.o torch_texture.o torch_antialias.o /nologo /DLL "/LIBPATH:C:\Users\John Dagdelen\anaconda3\envs\dmodel\lib\site-packages\nvdiffrast\torch\..\lib" /DEFAULTLIB:gdi32 /DEFAULTLIB:opengl32 /DEFAULTLIB:user32 /DEFAULTLIB:setgpu c10.lib c10_cuda.lib torch_cpu.lib torch_cuda_cu.lib -INCLUDE:?_torch_cuda_cu_linker_symbol_op_cuda@native@at@@YA?AVTensor@2@AEBV32@@Z 
torch_cuda_cpp.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ torch.lib "/LIBPATH:C:\Users\John Dagdelen\anaconda3\envs\dmodel\lib\site-packages\torch\lib" torch_python.lib "/LIBPATH:C:\Users\John Dagdelen\anaconda3\envs\dmodel\libs" "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib/x64" cudart.lib /out:nvdiffrast_plugin.pyd

glutil.o : fatal error LNK1000: Internal error during IMAGE::Pass1
ninja: build stopped: subcommand failed.
s-laine commented 2 years ago

I cannot say I've ever seen an internal linker error before. I'm only really guessing here, but maybe there is some sort of version conflict between your compilation tools and/or corrupted Cuda libraries? Just in case it's a transient bug, try clearing the torch extension cache and running the sample again. If the issue persists, all I can think of is reinstalling Visual C++ or Cuda toolkit, clearing the torch extension cache, and trying again. I understand this isn't much help.

Location of the extension cache may depend on your Python and PyTorch installation, but on my machine it's at %localappdata%\torch_extensions\torch_extensions\Cache\nvdiffrast_plugin. If you have trouble locating it, see what torch.utils.cpp_extension._get_build_directory('nvdiffrast_plugin', False) returns.

s-laine commented 2 years ago

@jdagdelen Did you find a solution to this? I'm mostly curious because of the extraordinary error message.

utkarshdwivedi3997 commented 1 year ago

I have this exact same issue, @s-laine did you ever find a fix for this?

My system details are: Windows 10, Visual Studio 2019, 2022, RTX 2080 Ti

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

>nvidia-smi
Mon Jun 26 10:46:03 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.98                 Driver Version: 535.98       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti   WDDM  | 00000000:65:00.0  On |                  N/A |
| 40%   43C    P2              55W / 260W |   9819MiB / 11264MiB |     26%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
Traceback (most recent call last):
  File "C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\lib\site-packages\torch\utils\cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\lib\subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\nvdiffrec\train.py", line 556, in <module>
    glctx = dr.RasterizeGLContext()
  File "C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\lib\site-packages\nvdiffrast\torch\ops.py", line 221, in __init__
    self.cpp_wrapper = _get_plugin(gl=True).RasterizeGLStateWrapper(output_db, mode == 'automatic', cuda_device_idx)
  File "C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\lib\site-packages\nvdiffrast\torch\ops.py", line 118, in _get_plugin
    torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts+['-lineinfo'], extra_ldflags=ldflags, with_cuda=True, verbose=False)
  File "C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\lib\site-packages\torch\utils\cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\lib\site-packages\torch\utils\cpp_extension.py", line 1509, in _jit_compile
    _write_ninja_file_and_build_library(
  File "C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\lib\site-packages\torch\utils\cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\lib\site-packages\torch\utils\cpp_extension.py", line 1909, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'nvdiffrast_plugin_gl': [1/1] "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64/link.exe" common.o glutil.o rasterize_gl.o torch_bindings_gl.o torch_rasterize_gl.o /nologo /DLL /LIBPATH:C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\lib\site-packages\nvdiffrast\torch\..\lib /DEFAULTLIB:gdi32 /DEFAULTLIB:opengl32 /DEFAULTLIB:user32 /DEFAULTLIB:setgpu c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ torch.lib /LIBPATH:C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\lib\x64" cudart.lib /out:nvdiffrast_plugin_gl.pyd
FAILED: nvdiffrast_plugin_gl.pyd
"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64/link.exe" common.o glutil.o rasterize_gl.o torch_bindings_gl.o torch_rasterize_gl.o /nologo /DLL /LIBPATH:C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\lib\site-packages\nvdiffrast\torch\..\lib /DEFAULTLIB:gdi32 /DEFAULTLIB:opengl32 /DEFAULTLIB:user32 /DEFAULTLIB:setgpu c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ torch.lib /LIBPATH:C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\Users\utkarsh\AppData\Local\miniconda3\envs\nvdiffrec\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\lib\x64" cudart.lib /out:nvdiffrast_plugin_gl.pyd

glutil.o : fatal error LNK1000: Internal error during IMAGE::Pass1
ninja: build stopped: subcommand failed.
s-laine commented 1 year ago

No, I haven't heard anything new. The only advice I have is the same as above.

utkarshdwivedi3997 commented 1 year ago

@s-laine so I figured out the issue on my end. It happened because I had multiple Visual Studio versions installed (both 2019 and 2022), and the code here was automatically detecting the 2019 path. This auto detection code doesn't work for VS2022 Pro version, since it's folder is "Pro" instead of "Professional". There's a few other issues in the path here.

This can also be solved by hard coding the system PATH to the VS2022 directory I believe, but I didn't want to do that so I just modified this code.

But I'm not sure why this was an issue in the first place. Perhaps I missed somewhere that nvdiffrast doesn't compile with VS2019?

s-laine commented 1 year ago

Glad to hear you got it working. There have been some problems with VS paths before, so we should probably take a closer look at them at some point.

Nvdiffrast has been tested to work with VS2019, so that shouldn't be an issue unless some critical components were not installed. Perhaps some part of PyTorch's extension build toolchain found and used VS2022 and the rest used VS2019 — the inner workings of the extension builder are mysterious and change from version to version. Mixing and matching compilation artifacts from different VS versions could at least explain the internal error.