Zhendong-Wang / Diffusion-GAN

Official PyTorch implementation for paper: Diffusion-GAN: Training GANs with Diffusion
MIT License
617 stars 65 forks source link

ninja: build stopped: subcommand failed. #23

Open octadion opened 1 year ago

octadion commented 1 year ago

got error ninja build stopped, when training diffsuion stylegan2. and because of this, i got warnings.warn('Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation. Details:\n\n' + traceback.format_exc()) and no module named upfirdn2d too.

my environment is same with environment.yml my gcc version is 9.4.0

Traceback (most recent call last): File "/opt/conda/envs/difgan/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build subprocess.run( File "/opt/conda/envs/difgan/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/ops/bias_act.py", line 41, in _init _plugin = custom_ops.get_plugin('bias_act_plugin', sources=sources, extra_cuda_cflags=['--use_fast_math']) File "/home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/custom_ops.py", line 103, in get_plugin torch.utils.cpp_extension.load(name=module_name, verbose=verbose_build, sources=sources, **build_kwargs) File "/opt/conda/envs/difgan/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1202, in load return _jit_compile( File "/opt/conda/envs/difgan/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile _write_ninja_file_and_build_library( File "/opt/conda/envs/difgan/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library _run_ninja_build( File "/opt/conda/envs/difgan/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'bias_act_plugin': [1/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/difgan/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/ops/bias_act.cu -o bias_act.cuda.o FAILED: bias_act.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/difgan/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/ops/bias_act.cu -o bias_act.cuda.o /bin/sh: 1: /usr/local/cuda/bin/nvcc: not found [2/3] c++ -MMD -MF bias_act.o.d -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/difgan/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/ops/bias_act.cpp -o bias_act.o FAILED: bias_act.o c++ -MMD -MF bias_act.o.d -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/difgan/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/ops/bias_act.cpp -o bias_act.o In file included from /home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/ops/bias_act.cpp:10: /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/ATen/cuda/CUDAContext.h:5:10: fatal error: cuda_runtime_api.h: No such file or directory 5 | #include | ^~~~~~~~ compilation terminated. ninja: build stopped: subcommand failed.

Zhendong-Wang commented 1 year ago

If the newest Pytorch doesn't work in your case, try this one https://github.com/Zhendong-Wang/Diffusion-GAN/blob/main/diffusion-insgen/environment.yml, which is 1.8.1 Pytorch, python 3.8 and should fit for most machines.

sahilqure commented 1 year ago

I tried with Pytorch 1.8.1 and python 3.8 but it still showing the same error

Zhendong-Wang commented 1 year ago

That is weird. The environment.yml works fine on my Linux/Ubuntu machines. I show the conda list output for my env built by the environement.yml below.

(difgan) zdwang@#######:~$ conda list
# packages in environment at /home/zdwang/miniconda3/envs/difgan:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
appdirs                   1.4.4              pyhd3eb1b0_0  
blas                      1.0                         mkl  
brotlipy                  0.7.0           py39h27cfd23_1003  
ca-certificates           2023.01.10           h06a4308_0  
certifi                   2022.12.7        py39h06a4308_0  
cffi                      1.15.1           py39h5eee18b_3  
charset-normalizer        2.0.4              pyhd3eb1b0_0  
click                     8.0.4            py39h06a4308_0  
cryptography              39.0.1           py39h9ce1e76_0  
cudatoolkit               11.6.0               habf752d_9    nvidia
flit-core                 3.8.0            py39h06a4308_0  
freetype                  2.12.1               h4a9f257_0  
giflib                    5.2.1                h5eee18b_3  
idna                      3.4              py39h06a4308_0  
imageio                   2.26.0           py39h06a4308_0  
imageio-ffmpeg            0.4.8                    pypi_0    pypi
intel-openmp              2021.4.0          h06a4308_3561  
jpeg                      9e                   h5eee18b_1  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.38                 h1181459_1  
lerc                      3.0                  h295c915_0  
libdeflate                1.17                 h5eee18b_0  
libffi                    3.4.2                h6a678d5_6  
libgcc-ng                 11.2.0               h1234567_1  
libgfortran-ng            11.2.0               h00389a5_1  
libgfortran5              11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libpng                    1.6.39               h5eee18b_0  
libstdcxx-ng              11.2.0               h1234567_1  
libtiff                   4.5.0                h6a678d5_2  
libwebp                   1.2.4                h11a3e52_1  
libwebp-base              1.2.4                h5eee18b_1  
lz4-c                     1.9.4                h6a678d5_0  
mkl                       2021.4.0           h06a4308_640  
mkl-service               2.4.0            py39h7f8727e_0  
mkl_fft                   1.3.1            py39hd3c417c_0  
mkl_random                1.2.2            py39h51133e4_0  
ncurses                   6.4                  h6a678d5_0  
ninja                     1.10.2               h06a4308_5  
ninja-base                1.10.2               hd09550d_5  
numpy                     1.23.5           py39h14f4228_0  
numpy-base                1.23.5           py39h31eccc5_0  
openssl                   1.1.1t               h7f8727e_0  
packaging                 23.0             py39h06a4308_0  
pillow                    9.4.0            py39h6a678d5_0  
pip                       23.0.1           py39h06a4308_0  
pooch                     1.4.0              pyhd3eb1b0_0  
psutil                    5.9.0            py39h5eee18b_0  
pycparser                 2.21               pyhd3eb1b0_0  
pyopenssl                 23.0.0           py39h06a4308_0  
pysocks                   1.7.1            py39h06a4308_0  
pyspng                    0.1.1                    pypi_0    pypi
python                    3.9.16               h7a1cb2a_2  
pytorch                   1.12.1          py3.9_cuda11.6_cudnn8.3.2_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
readline                  8.2                  h5eee18b_0  
requests                  2.28.1           py39h06a4308_1  
scipy                     1.10.0           py39h14f4228_1  
setuptools                65.6.3           py39h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.41.1               h5eee18b_0  
tk                        8.6.12               h1ccaba5_0  
tqdm                      4.65.0           py39hb070fc8_0  
typing_extensions         4.4.0            py39h06a4308_0  
tzdata                    2022g                h04d1e81_0  
urllib3                   1.26.14          py39h06a4308_0  
wheel                     0.38.4           py39h06a4308_0  
xz                        5.2.10               h5eee18b_1  
zlib                      1.2.13               h5eee18b_0  
zstd                      1.5.4                hc292b87_0

You could also try this Docker file https://github.com/NVlabs/stylegan2-ada-pytorch/blob/main/Dockerfile

octadion commented 1 year ago

I have found my problem, it appears that in my pytorch installation there is no runtime_api in /usr/local/cuda/ which makes my ninja seem not to be running. this is confusing because on my different machine everything works normally. do you know of a solution to my pytorch problem that there is no runtime_api ? I've tried reinstalling different versions but still nothing.

Zhendong-Wang commented 1 year ago

Never met the error. Is it something related to the cuda version while not Pytorch, as you said 'there is no runtime_api in /usr/local/cuda/'