ninja: build stopped: subcommand failed.

Liuwuyang1026 commented 3 months ago

Describe the bug RuntimeError: Error building extension 'bias_act_plugin': [1/2] D:\NVIDA CUDA\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output bia s_act.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_in terface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xc ompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -IC:\User s\29125\anaconda3\envs\stylegan\lib\site-packages\torch\include -IC:\Users\29125\anaconda3\envs\stylegan\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\29125\anaconda3\ envs\stylegan\lib\site-packages\torch\include\TH -IC:\Users\29125\anaconda3\envs\stylegan\lib\site-packages\torch\include\THC "-ID:\NVIDA CUDA\NVIDIA GPU Computing Toolkit\CUDA\v12.1\incl ude" -IC:\Users\29125\anaconda3\envs\stylegan\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO _HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17 --use_fast_math -c C:\Users\29125\AppData\Local\torch_e xtensions\torch_extensions\Cache\py39_cu121\bias_act_plugin\3cb576a0039689487cfba59279dd6d46-nvidia-geforce-rtx-3060-laptop-gpu\bias_act.cu -o bias_act.cuda.o bias_act.cu tmpxft_00007c80_00000000-10_bias_act.cudafe1.cpp [2/2] "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin/link.exe" bias_act.o bias_act.cuda.o /nologo /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@c uda@at@@YAHXZ torch.lib /LIBPATH:C:\Users\29125\anaconda3\envs\stylegan\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\Users\29125\anaconda3\envs\stylegan\libs "/LIBPATH:D:\NVID A CUDA\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64" cudart.lib /out:bias_act_plugin.pyd FAILED: bias_act_plugin.pyd "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin/link.exe" bias_act.o bias_act.cuda.o /nologo /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at @@YAHXZ torch.lib /LIBPATH:C:\Users\29125\anaconda3\envs\stylegan\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\Users\29125\anaconda3\envs\stylegan\libs "/LIBPATH:D:\NVIDA CUDA \NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64" cudart.lib /out:bias_act_plugin.pyd 正在创建库 bias_act_plugin.lib 和对象 bias_act_plugin.exp MSVCRT.lib(loadcfg.obj) : error LNK2001: 无法解析的外部符号 enclave_config MSVCRT.lib(loadcfg.obj) : error LNK2001: 无法解析的外部符号 __guard_eh_cont_table MSVCRT.lib(loadcfg.obj) : error LNK2001: 无法解析的外部符号 guard_eh_cont_count MSVCRT.lib(loadcfg.obj) : error LNK2001: 无法解析的外部符号 __volatile_metadata bias_act_plugin.pyd : fatal error LNK1120: 4 个无法解析的外部命令 ninja: build stopped: subcommand failed.

PyTorch version pytorch 2.2.2
CUDA toolkit version CUDA 12.1
NVIDIA driver version
GPU RTX 3060]

Liuwuyang1026 commented 3 months ago

I really need you!!Please!!!

fak111 commented 2 months ago

me too

fak111 commented 2 months ago

I met a similar problem using Ubuntu 22 with Anaconda as

Setting up PyTorch plugin "bias_act_plugin"... Failed! : FAILED: bias_act.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=bias_act_plugin -...

In my case, removing the nvcc solved the problem: sudo apt remove nvidia-cuda-toolkit .

hengfei-wang commented 3 weeks ago

got so many troubles when installing customized cuda extensions on a cluster without root account.

Really need a tutorial on how to run this project on a cluster without root privileges. :(

egaznep commented 2 weeks ago

I managed to get the CUDA kernels working by doing the following (should not require admin rights)

0) install a preferred flavor of conda (miniconda, anaconda, ...) if you don't have it 1) create a fresh environment. install the desired python version, a torch version from the pytorch channel, as well as cuda runtime and library packages. for me I think the following was sufficient (FYI I just needed the custom CUDA kernels and not the full StyleGAN3 stuff):

  - nvidia::cuda-nvcc=*12.1
  - nvidia::cuda-cudart-dev=*12.1
  - nvidia::cuda-cudart=*12.1
  - nvidia::libcusparse-dev=*12.1
  - nvidia::libcublas-dev=*12.1
  - nvidia::libcusolver-dev

and from pip I got these installed

ipython               8.25.0
ninja                 1.11.1.1
pip                   24.0
scipy                 1.13.1
setuptools            69.5.1
torch                 2.3.1
wheel                 0.43.0

3) When I tried to run stuff, I got errors indicating two headers could not be located, probably because of one of the nvidia conda packages. I had to copy two headers from their original folders to {ENV_DIR}/include/.

hengfei-wang commented 2 weeks ago

I finally solved this problem. It is related to the cuda installation. The cuda installed with cluster does not have some files. I reload a cuda module from pre-installed modules in cluster. Then the cuda extensions could be compiled successfully.

NVlabs / stylegan3

ninja: build stopped: subcommand failed. #640