NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.43k stars 157 forks source link

Compilation uses two different VS versions and fails unless PATH variable is explicitly set #154

Closed xuelongmu closed 9 months ago

xuelongmu commented 11 months ago

I'm using nvdiffrast with nvdiffrec. I followed the one time setup using PyTorch version 1.10 and CUDA 11.6 on Windows 11, with no errors.

However I get a compilation error when running train.py. I have both VS 2017 and 2022 installed, and I believe the error is due to diffrec using VS 2022 and diffrast using 2017, or maybe diffrast is trying to use both.

I am seeing things in the logs like

C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\include\array(577): note: see reference to class template instantiation 'c10::optional<std::shared_ptr<torch::jit::Graph>>' being compiled

and later on:

"C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\bin\Hostx64\x64/link.exe" common.o glutil.o rasterize_gl.o torch_bindings_gl.o torch_rasterize_gl.o /nologo /DLL /LIBPATH:D:\nvdiffrec\.conda\lib\site-packages\nvdiffrast\torch\..\lib /DEFAULTLIB:gdi32 /DEFAULTLIB:opengl32 /DEFAULTLIB:user32 /DEFAULTLIB:setgpu c10.lib c10_cuda.lib torch_cpu.lib torch_cuda_cu.lib -INCLUDE:?_torch_cuda_cu_linker_symbol_op_cuda@native@at@@YA?AVTensor@2@AEBV32@@Z torch_cuda_cpp.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ torch.lib /LIBPATH:D:\nvdiffrec\.conda\lib\site-packages\torch\lib torch_python.lib /LIBPATH:D:\nvdiffrec\.conda\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\lib/x64" cudart.lib /out:nvdiffrast_plugin_gl.pyd
   Creating library nvdiffrast_plugin_gl.lib and object nvdiffrast_plugin_gl.exp
MSVCRT.lib(loadcfg.obj) : error LNK2001: unresolved external symbol __guard_eh_cont_table
MSVCRT.lib(loadcfg.obj) : error LNK2001: unresolved external symbol __guard_eh_cont_count
nvdiffrast_plugin_gl.pyd : fatal error LNK1120: 2 unresolved externals

You can see the two different VS versions being referenced. I guess that binaries built with 2022 on diffrec are incompatible with diffrast using 2017. I have attached the full log: diffrast_error.txt

I solved the issue by explicitly specifying the 2022 cl.exe in my PATH. I tried running vcvarsall from the 2022 installation, but it didn't work, only setting it in the PATH.

I wanted to bring this to light as it was a niche error and I spent a while trying to figure it out, so logging it here could help others. I was actually led to believe from #81 that diffrast would have chosen 2022 due to the path order, but perhaps my assumption is incorrect. Is there a way to make the detection/selection of VS runtimes more consistent across nvdiffrec and nvdiffrast?

s-laine commented 10 months ago

Finding the right compiler has always been tricky, and it looks like it doesn't quite work in your setup. I think this is nvdiffrast somehow trying to use both versions, possibly disagreeing with what PyTorch thinks should be used.

I'm planning to change the compiler detection function as follows in the next maintenance release. If you have the time, you could try replacing this into the _get_plugin() function in nvdiffrast\torch\ops.py and see if it works better. It should be able to locate the latest revision of VS better than the current release version that has known pitfalls.

        def find_cl_path():
            import glob
            def get_sort_key(x):
                x = x.split('\\')[3:]
                x[1] = {'BuildTools': '~0', 'Community': '~1', 'Pro': '~2', 'Professional': '~3', 'Enterprise': '~4'}.get(x[1], x[1])
                return x
            vs_relative_path = r"\Microsoft Visual Studio\*\*\VC\Tools\MSVC\*\bin\Hostx64\x64"
            paths = glob.glob(r"C:\Program Files" + vs_relative_path)
            paths += glob.glob(r"C:\Program Files (x86)" + vs_relative_path)
            if paths:
                return sorted(paths, key=get_sort_key)[-1]