HJ-harry / score-MRI

Apache License 2.0
143 stars 23 forks source link

error when running with a cuda version higher than 10.2 #3

Closed fengfengfeng666 closed 1 year ago

fengfengfeng666 commented 2 years ago

Hi, thanks for your contribution of so interesting work. I'm trying to run the inference with GPU, because the workstation at my lab using a cuda version as 11.6: NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6

I have built a conda env that installing pytorch with "conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia". But in this situation, there is some errors as following:


Traceback (most recent call last):
  File "/home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
    subprocess.run(
  File "/home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "inference_real.py", line 7, in <module>
    from models import ncsnpp
  File "/home/lifeifei/score_Diff/score-MRI/models/ncsnpp.py", line 18, in <module>
    from . import utils, layers, layerspp, normalization
  File "/home/lifeifei/score_Diff/score-MRI/models/layerspp.py", line 20, in <module>
    from . import up_or_down_sampling
  File "/home/lifeifei/score_Diff/score-MRI/models/up_or_down_sampling.py", line 10, in <module>
    from op import upfirdn2d
  File "/home/lifeifei/score_Diff/score-MRI/op/__init__.py", line 1, in <module>
    from .fused_act import FusedLeakyReLU, fused_leaky_relu
  File "/home/lifeifei/score_Diff/score-MRI/op/fused_act.py", line 12, in <module>
    fused = load(
  File "/home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused': [1/3] /usr/local/cuda-11.6/bin/nvcc  -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include -isystem /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/TH -isystem /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.6/include -isystem /home/lifeifei/anaconda3/envs/score-POCS/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /home/lifeifei/score_Diff/score-MRI/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/usr/local/cuda-11.6/bin/nvcc  -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include -isystem /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/TH -isystem /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.6/include -isystem /home/lifeifei/anaconda3/envs/score-POCS/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /home/lifeifei/score_Diff/score-MRI/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
/bin/sh: 1: /usr/local/cuda-11.6/bin/nvcc: not found
[2/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include -isystem /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/TH -isystem /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.6/include -isystem /home/lifeifei/anaconda3/envs/score-POCS/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/lifeifei/score_Diff/score-MRI/op/fused_bias_act.cpp -o fused_bias_act.o
In file included from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8:0,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:11,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/ATen/core/Tensor.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/lifeifei/score_Diff/score-MRI/op/fused_bias_act.cpp:1:
/home/lifeifei/score_Diff/score-MRI/op/fused_bias_act.cpp: In function ‘at::Tensor fused_bias_act(const at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float)’:
/home/lifeifei/score_Diff/score-MRI/op/fused_bias_act.cpp:7:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/lifeifei/score_Diff/score-MRI/op/fused_bias_act.cpp:13:5: note: in expansion of macro ‘CHECK_CUDA’
     CHECK_CUDA(input);
     ^
In file included from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/ATen/core/Tensor.h:3:0,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/lifeifei/score_Diff/score-MRI/op/fused_bias_act.cpp:1:
/home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:216:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8:0,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:11,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/ATen/core/Tensor.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/lifeifei/score_Diff/score-MRI/op/fused_bias_act.cpp:1:
/home/lifeifei/score_Diff/score-MRI/op/fused_bias_act.cpp:7:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/lifeifei/score_Diff/score-MRI/op/fused_bias_act.cpp:14:5: note: in expansion of macro ‘CHECK_CUDA’
     CHECK_CUDA(bias);
     ^
In file included from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/ATen/core/Tensor.h:3:0,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/lifeifei/score_Diff/score-MRI/op/fused_bias_act.cpp:1:
/home/lifeifei/anaconda3/envs/score-POCS/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:216:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
ninja: build stopped: subcommand failed.

When I run the inference in the conda env that installing pytorch with "conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch". The inference can be run, but on this situation I can not run it on GPU, because the pytorch in this env can not use cuda.

I want to ask, whether the reference should be run with cuda==11.6.

jwen307 commented 1 year ago

I had a similar issue when I installed the latest PyTorch 1.13.0 with cudatoolkit=11.7. I solved it by going to fused_bias_act.cpp line 7 and changing x.type().is_cuda() to x.is_cuda().

I had also had some issues with my GCC compiler where it was no longer compatible with PyTorch. It seems to need GCC > 5.0 and GCC < 11.0. I was able to fix that with conda install -c conda-forge cxx-compiler, which installs GCC 12.1. Then, I could downgrade it with conda install -c conda-forge gcc=10.4.0. After this, I was able to get the inference to use my GPU. I hope this helps.

HJ-harry commented 1 year ago

@fengfengfeng666 Yes, you should install more recent versions of pytorch with cudatoolkit >= 11.x. I cannot exactly check if it will work as I have a different environment from yours, but I believe there won't be too much problem with this.

fengfengfeng666 commented 1 year ago

I had a similar issue when I installed the latest PyTorch 1.13.0 with cudatoolkit=11.7. I solved it by going to fused_bias_act.cpp line 7 and changing x.type().is_cuda() to x.is_cuda().

I had also had some issues with my GCC compiler where it was no longer compatible with PyTorch. It seems to need GCC > 5.0 and GCC < 11.0. I was able to fix that with conda install -c conda-forge cxx-compiler, which installs GCC 12.1. Then, I could downgrade it with conda install -c conda-forge gcc=10.4.0. After this, I was able to get the inference to use my GPU. I hope this helps.

perfect solved my issue. thx