Open octadion opened 1 year ago
If the newest Pytorch doesn't work in your case, try this one https://github.com/Zhendong-Wang/Diffusion-GAN/blob/main/diffusion-insgen/environment.yml, which is 1.8.1 Pytorch, python 3.8 and should fit for most machines.
I tried with Pytorch 1.8.1 and python 3.8 but it still showing the same error
That is weird. The environment.yml works fine on my Linux/Ubuntu machines. I show the conda list output for my env built by the environement.yml below.
(difgan) zdwang@#######:~$ conda list
# packages in environment at /home/zdwang/miniconda3/envs/difgan:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
appdirs 1.4.4 pyhd3eb1b0_0
blas 1.0 mkl
brotlipy 0.7.0 py39h27cfd23_1003
ca-certificates 2023.01.10 h06a4308_0
certifi 2022.12.7 py39h06a4308_0
cffi 1.15.1 py39h5eee18b_3
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.0.4 py39h06a4308_0
cryptography 39.0.1 py39h9ce1e76_0
cudatoolkit 11.6.0 habf752d_9 nvidia
flit-core 3.8.0 py39h06a4308_0
freetype 2.12.1 h4a9f257_0
giflib 5.2.1 h5eee18b_3
idna 3.4 py39h06a4308_0
imageio 2.26.0 py39h06a4308_0
imageio-ffmpeg 0.4.8 pypi_0 pypi
intel-openmp 2021.4.0 h06a4308_3561
jpeg 9e h5eee18b_1
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libdeflate 1.17 h5eee18b_0
libffi 3.4.2 h6a678d5_6
libgcc-ng 11.2.0 h1234567_1
libgfortran-ng 11.2.0 h00389a5_1
libgfortran5 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libpng 1.6.39 h5eee18b_0
libstdcxx-ng 11.2.0 h1234567_1
libtiff 4.5.0 h6a678d5_2
libwebp 1.2.4 h11a3e52_1
libwebp-base 1.2.4 h5eee18b_1
lz4-c 1.9.4 h6a678d5_0
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py39h7f8727e_0
mkl_fft 1.3.1 py39hd3c417c_0
mkl_random 1.2.2 py39h51133e4_0
ncurses 6.4 h6a678d5_0
ninja 1.10.2 h06a4308_5
ninja-base 1.10.2 hd09550d_5
numpy 1.23.5 py39h14f4228_0
numpy-base 1.23.5 py39h31eccc5_0
openssl 1.1.1t h7f8727e_0
packaging 23.0 py39h06a4308_0
pillow 9.4.0 py39h6a678d5_0
pip 23.0.1 py39h06a4308_0
pooch 1.4.0 pyhd3eb1b0_0
psutil 5.9.0 py39h5eee18b_0
pycparser 2.21 pyhd3eb1b0_0
pyopenssl 23.0.0 py39h06a4308_0
pysocks 1.7.1 py39h06a4308_0
pyspng 0.1.1 pypi_0 pypi
python 3.9.16 h7a1cb2a_2
pytorch 1.12.1 py3.9_cuda11.6_cudnn8.3.2_0 pytorch
pytorch-mutex 1.0 cuda pytorch
readline 8.2 h5eee18b_0
requests 2.28.1 py39h06a4308_1
scipy 1.10.0 py39h14f4228_1
setuptools 65.6.3 py39h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.41.1 h5eee18b_0
tk 8.6.12 h1ccaba5_0
tqdm 4.65.0 py39hb070fc8_0
typing_extensions 4.4.0 py39h06a4308_0
tzdata 2022g h04d1e81_0
urllib3 1.26.14 py39h06a4308_0
wheel 0.38.4 py39h06a4308_0
xz 5.2.10 h5eee18b_1
zlib 1.2.13 h5eee18b_0
zstd 1.5.4 hc292b87_0
You could also try this Docker file https://github.com/NVlabs/stylegan2-ada-pytorch/blob/main/Dockerfile
I have found my problem, it appears that in my pytorch installation there is no runtime_api in /usr/local/cuda/ which makes my ninja seem not to be running. this is confusing because on my different machine everything works normally. do you know of a solution to my pytorch problem that there is no runtime_api ? I've tried reinstalling different versions but still nothing.
Never met the error. Is it something related to the cuda version while not Pytorch, as you said 'there is no runtime_api in /usr/local/cuda/'
got error ninja build stopped, when training diffsuion stylegan2. and because of this, i got warnings.warn('Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation. Details:\n\n' + traceback.format_exc()) and no module named upfirdn2d too.
my environment is same with environment.yml my gcc version is 9.4.0
Traceback (most recent call last): File "/opt/conda/envs/difgan/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build subprocess.run( File "/opt/conda/envs/difgan/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/ops/bias_act.py", line 41, in _init _plugin = custom_ops.get_plugin('bias_act_plugin', sources=sources, extra_cuda_cflags=['--use_fast_math']) File "/home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/custom_ops.py", line 103, in get_plugin torch.utils.cpp_extension.load(name=module_name, verbose=verbose_build, sources=sources, **build_kwargs) File "/opt/conda/envs/difgan/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1202, in load return _jit_compile( File "/opt/conda/envs/difgan/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile _write_ninja_file_and_build_library( File "/opt/conda/envs/difgan/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library _run_ninja_build( File "/opt/conda/envs/difgan/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'bias_act_plugin': [1/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/difgan/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/ops/bias_act.cu -o bias_act.cuda.o FAILED: bias_act.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/difgan/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/ops/bias_act.cu -o bias_act.cuda.o /bin/sh: 1: /usr/local/cuda/bin/nvcc: not found [2/3] c++ -MMD -MF bias_act.o.d -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/difgan/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/ops/bias_act.cpp -o bias_act.o FAILED: bias_act.o c++ -MMD -MF bias_act.o.d -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/TH -isystem /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/difgan/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/ops/bias_act.cpp -o bias_act.o In file included from /home/octadion/diffusion-gan/Diffusion-GAN/diffusion-stylegan2/torch_utils/ops/bias_act.cpp:10: /opt/conda/envs/difgan/lib/python3.9/site-packages/torch/include/ATen/cuda/CUDAContext.h:5:10: fatal error: cuda_runtime_api.h: No such file or directory 5 | #include
| ^
~~~~~~~ compilation terminated. ninja: build stopped: subcommand failed.