CharlesShang / DCNv2

Deformable Convolutional Networks v2 with Pytorch
BSD 3-Clause "New" or "Revised" License
1.27k stars 399 forks source link

RuntimeError: Not compiled with GPU support #82

Open KiedaTamashi opened 3 years ago

KiedaTamashi commented 3 years ago

I get this error when running testcuda.py on Linux server.

I test torch.cuda.available() and get True. My cuda version: 10.1 My torch version: 1.4 My python version: 3.6.9

It seems built successfully: copying build/lib.linux-x86_64-3.6/_ext.cpython-36m-x86_64-linux-gnu.so ->

!/bin/bash

Creating /NAS/home01/tanzhenwei/.pyenv/versions/3.6.9/envs/tzwpy/lib/python3.6/site-packages/DCNv2.egg-link (link to .) DCNv2 0.1 is already the active version in easy-install.pth

▽ Installed /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new Processing dependencies for DCNv2==0.1 Finished processing dependencies for DCNv2==0.1

But get error when testing True /usr/local/cuda Traceback (most recent call last): File "/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/testcuda.py", line 255, in example_dconv() File "/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/testcuda.py", line 175, in example_dconv output = dcn(input) File "/NAS/home01/tanzhenwei/.pyenv/versions/tzwpy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/dcn_v2.py", line 128, in forward self.deformable_groups) File "/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/dcn_v2.py", line 31, in forward ctx.deformable_groups) RuntimeError: Not compiled with GPU support (dcn_v2_forward at /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/src/dcn_v2.h:35) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f5f96ec4193 in /NAS/home01/tanzhenwei/.pyenv/versions/tzwpy/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: dcn_v2_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0x157 (0x7f5f91a755d7 in /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/_ext.cpython-36m-x86_64-linux-gnu.so) frame #2: + 0x17504 (0x7f5f91a82504 in /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/_ext.cpython-36m-x86_64-linux-gnu.so) ...

Could you help me solve this or give some ideas?

shafu0x commented 3 years ago

I have the same problem. Did anyone fix it?

wenjiey2 commented 3 years ago

Were you able to solve this issue? I am facing the same problem with pytorch 1.4.0-py3.6_cuda101_cudnn7_0 and torchvision 0.5.0-py36_cu101. It is invoked by _backend.dcn_v2_forward where _backend should be _ext built from make.sh. I'm not sure if _ext refers to this _ext.cp36-win_amd64.pyd file. Not sure how to proceed from here.

KiedaTamashi commented 3 years ago

@wenjiey2 @SharifElfouly Hi, I have fixed it. The situation for me is that I was using a virtual env and try to run it in the computing node by submitting a task to the server. But I install the env when using nodes without GPU and get this error.

Therefore, I solved it by installing everything, including the virtual env, in the node with GPU and it works.

suniash commented 3 years ago

@XiaoSanGit ..thank-you can you please explain in detail how to solve this issue?

allenwu5 commented 3 years ago

I resolved this issue by forcing python setup.py build develop go through https://github.com/CharlesShang/DCNv2/blob/c7f778f28b84c66d3af2bf16f19148a07051dac1/setup.py#L34-L42

fabrizioschiano commented 2 years ago

I resolved this issue by forcing python setup.py build develop go through

https://github.com/CharlesShang/DCNv2/blob/c7f778f28b84c66d3af2bf16f19148a07051dac1/setup.py#L34-L42

@allenwu5 , thanks for posting your solution. I tried to replicate it and understood that the problem is the following (at least for me):

torch.cuda.is_available():  True
CUDA_HOME:  None

Therefore, if I just force the code to go through that loop (by removing the and CUDA_HOME is not None) I have another error:

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. I am currently trying to understand how to correctly set the CUDA_HOME variable. If, in the meantime, you have time to give us more details it would be helpful.

fabrizioschiano commented 2 years ago

After some research, I understood that the problem was that I actually did not have CUDA installed.

You can find it out by doing:
nvcc –V

If nothing is returned it means that you did not install CUDA

I followed all this:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/

And I installed CUDA with the following official link

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=deb_local

After what's explained above, I did installed the nvidia-development-kit simply with

sudo apt install nvidia-cuda-toolkit

Then you can do:

export CUDA_HOME=/usr/local/cuda-11

(before doing it you should check that this is the folder in which CUDA has been installed on your machine)

I hope this helps someone else in the same situation.

jatinkatyal commented 2 years ago

After some research, I understood that the problem was that I actually did not have CUDA installed.

You can find it out by doing: nvcc –V

If nothing is returned it means that you did not install CUDA

I followed all this:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/

And I installed CUDA with the following official link

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=deb_local

After what's explained above, I did installed the nvidia-development-kit simply with

sudo apt install nvidia-cuda-toolkit

Then you can do:

export CUDA_HOME=/usr/local/cuda-11

(before doing it you should check that this is the folder in which CUDA has been installed on your machine)

I hope this helps someone else in the same situation.

I am in something deeper, can you help?

$ nvcc -V
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

$ conda activate compvis36
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Aug_15_21:14:11_PDT_2021
Cuda compilation tools, release 11.4, V11.4.120
Build cuda_11.4.r11.4/compiler.30300941_0

With/without the environment active when I type in

$ sudo apt install nvidia-cuda-toolkit
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-cuda-toolkit : Depends: nvidia-cuda-dev (= 10.1.243-3) but it is not going to be installed
                       Recommends: nsight-compute (= 10.1.243-3)
                       Recommends: nsight-systems (= 10.1.243-3)
E: Unable to correct problems, you have held broken packages.

Any tips on how can I get the nvidia-cuda-toolkit for version 11.4.? Those listed on https://packages.ubuntu.com/search?keywords=nvidia-cuda-toolkit are of lower version.

jatinkatyal commented 2 years ago

After some research, I understood that the problem was that I actually did not have CUDA installed. You can find it out by doing: nvcc –V If nothing is returned it means that you did not install CUDA I followed all this: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/ And I installed CUDA with the following official link https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=deb_local After what's explained above, I did installed the nvidia-development-kit simply with sudo apt install nvidia-cuda-toolkit Then you can do: export CUDA_HOME=/usr/local/cuda-11 (before doing it you should check that this is the folder in which CUDA has been installed on your machine) I hope this helps someone else in the same situation.

I am in something deeper, can you help?

$ nvcc -V
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

$ conda activate compvis36
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Aug_15_21:14:11_PDT_2021
Cuda compilation tools, release 11.4, V11.4.120
Build cuda_11.4.r11.4/compiler.30300941_0

With/without the environment active when I type in

$ sudo apt install nvidia-cuda-toolkit
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-cuda-toolkit : Depends: nvidia-cuda-dev (= 10.1.243-3) but it is not going to be installed
                       Recommends: nsight-compute (= 10.1.243-3)
                       Recommends: nsight-systems (= 10.1.243-3)
E: Unable to correct problems, you have held broken packages.

Any tips on how can I get the nvidia-cuda-toolkit for version 11.4.? Those listed on https://packages.ubuntu.com/search?keywords=nvidia-cuda-toolkit are of lower version.

I fixed this by reinstalling cuda 11.4 using run file from nvidia. but now I am facing different issues which are reported on the repo. Like import error for _ext. Switching to different issue threads now.

TaQuangTu commented 2 years ago

For orthers coming later, remember to set CUDA_HOME environment variable. export CUDA_HOME=/path/to/your/cuda/

tranngocphuong89 commented 2 years ago

I resolved this issue by forcing python setup.py build develop go through

https://github.com/CharlesShang/DCNv2/blob/c7f778f28b84c66d3af2bf16f19148a07051dac1/setup.py#L34-L42

I works for me. thanks