facebookresearch / maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
MIT License
9.31k stars 2.49k forks source link

5 | #include <THC/THC.h> | ^~~~~~~~~~~ compilation terminated. error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1 #1340

Closed Souryadipstan closed 2 years ago

Souryadipstan commented 2 years ago

This error is showing up when I am trying to run the last python setup.py build develop command.

5 | #include <THC/THC.h> | ^~~ compilation terminated. error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1

I am using pytorch 1.11, cudatoolkit 11.3 and python 3.7.13

Souryadipstan commented 2 years ago

Downgrading pytorch to 1.10.1 solved the bug.

pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

conda install -c pytorch pytorch=1.10.1 torchvision torchaudio

HefengRAY commented 1 year ago

Downgrading pytorch to 1.10.1 solved the bug.

pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

conda install -c pytorch pytorch=1.10.1 torchvision torchaudio

I am using the same version,but it still show up the error : error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1. Do you know how to solve it? thx!

zhangj1an commented 1 year ago

I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.

TreezzZ commented 1 year ago

I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.

It also works for me. Thx!

felixshing commented 1 year ago

I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.

This solution can address THC problem. However, later on I met this problem.

maskrcnn_benchmark/csrc/cuda/deform_conv_kernel_cuda.cu(363): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half> *, c10::Half)
          detected during instantiation of "void deformable_col2im_gpu_kernel(int, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, scalar_t *) [with scalar_t=c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half>]" 
(386): here

I try this but it does not work on my end: https://forums.developer.nvidia.com/t/atomicadd-not-overloaded-for-c10-half/204474/5

Did you meet this problem before? Thanks!

felixshing commented 1 year ago

I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.

It also works for me. Thx!

I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.

This solution can address THC problem. However, later on I met this problem.

maskrcnn_benchmark/csrc/cuda/deform_conv_kernel_cuda.cu(363): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half> *, c10::Half)
          detected during instantiation of "void deformable_col2im_gpu_kernel(int, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, scalar_t *) [with scalar_t=c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half>]" 
(386): here

I try this but it does not work on my end: https://forums.developer.nvidia.com/t/atomicadd-not-overloaded-for-c10-half/204474/5

Did you meet this problem before? Thanks!

TreezzZ commented 1 year ago

I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.

It also works for me. Thx!

I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.

This solution can address THC problem. However, later on I met this problem.

maskrcnn_benchmark/csrc/cuda/deform_conv_kernel_cuda.cu(363): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half> *, c10::Half)
          detected during instantiation of "void deformable_col2im_gpu_kernel(int, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, scalar_t *) [with scalar_t=c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half>]" 
(386): here

I try this but it does not work on my end: https://forums.developer.nvidia.com/t/atomicadd-not-overloaded-for-c10-half/204474/5

Did you meet this problem before? Thanks!

Correct. Even if we can successfully compile this repo, it'll still face tremendous errors when running codes if we use torch 1.13.1. After that, I tried a bunch of combinations of torch, apex, cuda, etc. Here is my environment now.

OS: Ubuntu 22.04 GPU: A6000

# Install basic packages
conda create -n env1 python=3.8 -y
conda activate env1
conda install -y ipython scipy h5py
pip install ninja yacs cython matplotlib tqdm opencv-python overrides
pip uninstall numpy
pip install numpy==1.23.0 # Require specific version
conda install -y pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -c conda-forge

# Install pycocotools
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install

# Install apex
# It is likely to encounter apex and cuda versions mismatching issue.
# Modify apex/setup.py. Find the function check_cuda_torch_binary_vs_bare_metal() and add "return" to the first line of the function body for skipping check
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

# Install maskrcnn_benchmark
# If it shows an error about "torch._six_PY3" when running codes, you should modify maskrcnn_benchmark/utils/imports.py and change "if torch._six.PY3" to "if torch._six.PY37", or simply remove it.
git clone https://github.com/Idolized22/maskrcnn-benchmark.git
cd maskrcnn-benchmark
python setup.py build develop
felixshing commented 1 year ago

I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.

It also works for me. Thx!

I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.

This solution can address THC problem. However, later on I met this problem.

maskrcnn_benchmark/csrc/cuda/deform_conv_kernel_cuda.cu(363): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half> *, c10::Half)
          detected during instantiation of "void deformable_col2im_gpu_kernel(int, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, scalar_t *) [with scalar_t=c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half>]" 
(386): here

I try this but it does not work on my end: https://forums.developer.nvidia.com/t/atomicadd-not-overloaded-for-c10-half/204474/5 Did you meet this problem before? Thanks!

Correct. Even if we can successfully compile this repo, it'll still face tremendous errors when running codes if we use torch 1.13.1. After that, I tried a bunch of combinations of torch, apex, cuda, etc. Here is my environment now.

OS: Ubuntu 22.04 GPU: A6000

# Install basic packages
conda create -n env1 python=3.8 -y
conda activate env1
conda install -y ipython scipy h5py
pip install ninja yacs cython matplotlib tqdm opencv-python overrides
pip uninstall numpy
pip install numpy==1.23.0 # Require specific version
conda install -y pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -c conda-forge

# Install pycocotools
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install

# Install apex
# It is likely to encounter apex and cuda versions mismatching issue.
# Modify apex/setup.py. Find the function check_cuda_torch_binary_vs_bare_metal() and add "return" to the first line of the function body for skipping check
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

# Install maskrcnn_benchmark
# If it shows an error about "torch._six_PY3" when running codes, you should modify maskrcnn_benchmark/utils/imports.py and change "if torch._six.PY3" to "if torch._six.PY37", or simply remove it.
git clone https://github.com/Idolized22/maskrcnn-benchmark.git
cd maskrcnn-benchmark
python setup.py build develop

Yeah.. Seems downgrade pytorch to <=1.10 is the only solution... Thank you for your reply!!