Open Liuwuyang1026 opened 3 months ago
I really need you!!Please!!!
me too
I met a similar problem using Ubuntu 22 with Anaconda as
Setting up PyTorch plugin "bias_act_plugin"... Failed! : FAILED: bias_act.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=bias_act_plugin -...
In my case, removing the nvcc solved the problem: sudo apt remove nvidia-cuda-toolkit .
got so many troubles when installing customized cuda extensions on a cluster without root account.
Really need a tutorial on how to run this project on a cluster without root privileges. :(
I managed to get the CUDA kernels working by doing the following (should not require admin rights)
0) install a preferred flavor of conda
(miniconda
, anaconda
, ...) if you don't have it
1) create a fresh environment. install the desired python
version, a torch
version from the pytorch
channel, as well as cuda runtime and library packages. for me I think the following was sufficient (FYI I just needed the custom CUDA kernels and not the full StyleGAN3 stuff):
- nvidia::cuda-nvcc=*12.1
- nvidia::cuda-cudart-dev=*12.1
- nvidia::cuda-cudart=*12.1
- nvidia::libcusparse-dev=*12.1
- nvidia::libcublas-dev=*12.1
- nvidia::libcusolver-dev
and from pip I got these installed
ipython 8.25.0
ninja 1.11.1.1
pip 24.0
scipy 1.13.1
setuptools 69.5.1
torch 2.3.1
wheel 0.43.0
3) When I tried to run stuff, I got errors indicating two headers could not be located, probably because of one of the nvidia conda packages. I had to copy two headers from their original folders to {ENV_DIR}/include/
.
I finally solved this problem. It is related to the cuda installation. The cuda installed with cluster does not have some files. I reload a cuda module from pre-installed modules in cluster. Then the cuda extensions could be compiled successfully.
Describe the bug RuntimeError: Error building extension 'bias_act_plugin': [1/2] D:\NVIDA CUDA\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output bia s_act.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_in terface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xc ompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -IC:\User s\29125\anaconda3\envs\stylegan\lib\site-packages\torch\include -IC:\Users\29125\anaconda3\envs\stylegan\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\29125\anaconda3\ envs\stylegan\lib\site-packages\torch\include\TH -IC:\Users\29125\anaconda3\envs\stylegan\lib\site-packages\torch\include\THC "-ID:\NVIDA CUDA\NVIDIA GPU Computing Toolkit\CUDA\v12.1\incl ude" -IC:\Users\29125\anaconda3\envs\stylegan\Include -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO _HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17 --use_fast_math -c C:\Users\29125\AppData\Local\torch_e xtensions\torch_extensions\Cache\py39_cu121\bias_act_plugin\3cb576a0039689487cfba59279dd6d46-nvidia-geforce-rtx-3060-laptop-gpu\bias_act.cu -o bias_act.cuda.o bias_act.cu tmpxft_00007c80_00000000-10_bias_act.cudafe1.cpp [2/2] "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin/link.exe" bias_act.o bias_act.cuda.o /nologo /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@c uda@at@@YAHXZ torch.lib /LIBPATH:C:\Users\29125\anaconda3\envs\stylegan\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\Users\29125\anaconda3\envs\stylegan\libs "/LIBPATH:D:\NVID A CUDA\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64" cudart.lib /out:bias_act_plugin.pyd FAILED: bias_act_plugin.pyd "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin/link.exe" bias_act.o bias_act.cuda.o /nologo /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at @@YAHXZ torch.lib /LIBPATH:C:\Users\29125\anaconda3\envs\stylegan\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\Users\29125\anaconda3\envs\stylegan\libs "/LIBPATH:D:\NVIDA CUDA \NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64" cudart.lib /out:bias_act_plugin.pyd 正在创建库 bias_act_plugin.lib 和对象 bias_act_plugin.exp MSVCRT.lib(loadcfg.obj) : error LNK2001: 无法解析的外部符号 enclave_config MSVCRT.lib(loadcfg.obj) : error LNK2001: 无法解析的外部符号 __guard_eh_cont_table MSVCRT.lib(loadcfg.obj) : error LNK2001: 无法解析的外部符号 guard_eh_cont_count MSVCRT.lib(loadcfg.obj) : error LNK2001: 无法解析的外部符号 __volatile_metadata bias_act_plugin.pyd : fatal error LNK1120: 4 个无法解析的外部命令 ninja: build stopped: subcommand failed.