ninja: build stopped: subcommand failed.

237014845 commented 1 year ago

compile cuda source of 'pair_wise_distance' function...
NOTE: if you avoid this process, you make .cu file and compile it following https://pytorch.org/tutorials/advanced/cpp_extension.html
Traceback (most recent call last):
File "/home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1723, in _run_ninja_build
env=env)
File "/home/hs/anaconda3/envs/tgrs1/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "demo_train.py", line 17, in
sr_model = model.Model(args, checkpoint)
File "/media/save_old/mnt/hs_data/SR_work/tgrs/LBNet/codes/model/init.py", line 23, in init
module = import_module('model.' + args.model.lower())
File "/home/hs/anaconda3/envs/tgrs1/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked File "", line 677, in _load_unlocked File "", line 728, in exec_module File "", line 219, in _call_with_frames_removed File "/media/save_old/mnt/hs_data/SR_work/tgrs/LBNet/codes/model/spin.py", line 7, in from .pair_wise_distance import PairwiseDistFunction File "/media/save_old/mnt/hs_data/SR_work/tgrs/LBNet/codes/model/pair_wise_distance.py", line 9, in "pair_wise_distance", cpp_sources="", cuda_sources=source File "/home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1285, in load_inline keep_intermediates=keep_intermediates) File "/home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile is_standalone=is_standalone) File "/home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1452, in _write_ninja_file_and_build_library error_prefix=f"Error building extension '{name}'") File "/home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build raise RuntimeError(message) from e
RuntimeError: Error building extension 'pair_wise_distance': [1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=pair_wise_distance -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/hs$ anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/include -isystem /home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/inc lude/torch/csrc/api/include -isystem /home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/include/TH -isystem /home/hs/anaconda3 /envs/tgrs1/lib/python3.7/site-packages/torch/include/THC -isystem /home/hs/anaconda3/envs/tgrs1/include/python3.7m -D_GLIBCXX_USE_CXX11_AB I=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-rel axed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++14 -c /home/ hs/.cache/torch_extensions/py37_cu111/pair_wise_distance/cuda.cu -o cuda.cuda.o
FAILED: cuda.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=pair_wise_distance -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLI B=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/include -isy stem /home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/hs/anaconda3/envs/tgrs1/ lib/python3.7/site-packages/torch/include/TH -isystem /home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/include/THC -isystem /home/hs/anaconda3/envs/tgrs1/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -D_ _CUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=ar ch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++14 -c /home/hs/.cache/torch_extensions/py37_cu111/pair_wise_distance/cuda.cu -o cuda.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_86'
[2/3] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=pair_wise_distance -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DP YBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch /include -isystem /home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/hs/anaconda 3/envs/tgrs1/lib/python3.7/site-packages/torch/include/TH -isystem /home/hs/anaconda3/envs/tgrs1/lib/python3.7/site-packages/torch/include/ THC -isystem /home/hs/anaconda3/envs/tgrs1/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/hs/.cache/torch_extensio ns/py37_cu111/pair_wise_distance/main.cpp -o main.o ninja: build stopped: subcommand failed.

wizard1023 commented 1 year ago

I have the same problem that nvcc fatal : Unsupported gpu architecture 'compute_86' have you solved it?

JiangYun77 commented 1 year ago

same problem

sym330 commented 1 year ago

same error

Wwwww-disign commented 12 months ago

same error

WadeChiang commented 11 months ago

I encountered the 'Unsupported GPU architecture 'compute_89'' error and managed to resolve it using a Docker container. Here's a brief walkthrough of my solution:

I utilized a Docker container built by DGL(https://catalog.ngc.nvidia.com/orgs/nvidia/containers/dgl). This image primarily includes CUDA 12.2, Torch 2.1.0, and the Deep Graph Library. After torch 1.10, THC namespace is deprecated and migrated into ATen, so within the Docker container, I made the following changes to the headers of the pair_wise_distance_cuda_source.py source file:

#include <stdio.h>
#include <math.h>
#include <cuda.h>
#include <cuda_runtime.h>

#define CUDA_NUM_THREADS 256

#include <torch/extension.h>
#include <torch/types.h>
#include <ATen/core/TensorAccessor.h>
#include <ATen/cuda/CUDAContext.h>
#include <ATen/cuda/Atomic.cuh>
#include <ATen/cuda/DeviceUtils.cuh>
// #include <THC/THC.h>
// #include <THC/THCAtomics.cuh>
// #include <THC/THCDeviceUtils.cuh>

After these modifications, I successfully loaded pair_wise_distance_cuda.

I'm not entirely sure why environment in DGL image fix the problem , but I hope this can be of help to others facing a similar issue.

@237014845 @Wwwww-disign @sym330 @wizard1023 @JiangYun77

daduguai commented 4 months ago

I encountered the 'Unsupported GPU architecture 'compute_89'' error and managed to resolve it using a Docker container. Here's a brief walkthrough of my solution:

I utilized a Docker container built by DGL(https://catalog.ngc.nvidia.com/orgs/nvidia/containers/dgl). This image primarily includes CUDA 12.2, Torch 2.1.0, and the Deep Graph Library. After torch 1.10, THC namespace is deprecated and migrated into ATen, so within the Docker container, I made the following changes to the headers of the pair_wise_distance_cuda_source.py source file:
#include <stdio.h>
#include <math.h>
#include <cuda.h>
#include <cuda_runtime.h>

#define CUDA_NUM_THREADS 256

#include <torch/extension.h>
#include <torch/types.h>
#include <ATen/core/TensorAccessor.h>
#include <ATen/cuda/CUDAContext.h>
#include <ATen/cuda/Atomic.cuh>
#include <ATen/cuda/DeviceUtils.cuh>
// #include <THC/THC.h>
// #include <THC/THCAtomics.cuh>
// #include <THC/THCDeviceUtils.cuh>
After these modifications, I successfully loaded pair_wise_distance_cuda.

I'm not entirely sure why environment in DGL image fix the problem , but I hope this can be of help to others facing a similar issue.

@237014845 @Wwwww-disign @sym330 @wizard1023 @JiangYun77

That works, Thanks

ArcticHare105 / SPIN

ninja: build stopped: subcommand failed. #2