Closed Souryadipstan closed 2 years ago
Downgrading pytorch to 1.10.1 solved the bug.
pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
conda install -c pytorch pytorch=1.10.1 torchvision torchaudio
Downgrading pytorch to 1.10.1 solved the bug.
pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
conda install -c pytorch pytorch=1.10.1 torchvision torchaudio
I am using the same version,but it still show up the error : error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1. Do you know how to solve it? thx!
I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.
I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.
It also works for me. Thx!
I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.
This solution can address THC problem. However, later on I met this problem.
maskrcnn_benchmark/csrc/cuda/deform_conv_kernel_cuda.cu(363): error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half> *, c10::Half)
detected during instantiation of "void deformable_col2im_gpu_kernel(int, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, scalar_t *) [with scalar_t=c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half>]"
(386): here
I try this but it does not work on my end: https://forums.developer.nvidia.com/t/atomicadd-not-overloaded-for-c10-half/204474/5
Did you meet this problem before? Thanks!
I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.
It also works for me. Thx!
I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.
This solution can address THC problem. However, later on I met this problem.
maskrcnn_benchmark/csrc/cuda/deform_conv_kernel_cuda.cu(363): error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half> *, c10::Half)
detected during instantiation of "void deformable_col2im_gpu_kernel(int, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, scalar_t *) [with scalar_t=c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half>]"
(386): here
I try this but it does not work on my end: https://forums.developer.nvidia.com/t/atomicadd-not-overloaded-for-c10-half/204474/5
Did you meet this problem before? Thanks!
I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.
It also works for me. Thx!
I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.
This solution can address THC problem. However, later on I met this problem.
maskrcnn_benchmark/csrc/cuda/deform_conv_kernel_cuda.cu(363): error: no instance of overloaded function "atomicAdd" matches the argument list argument types are: (c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half> *, c10::Half) detected during instantiation of "void deformable_col2im_gpu_kernel(int, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, scalar_t *) [with scalar_t=c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half>]" (386): here
I try this but it does not work on my end: https://forums.developer.nvidia.com/t/atomicadd-not-overloaded-for-c10-half/204474/5
Did you meet this problem before? Thanks!
Correct. Even if we can successfully compile this repo, it'll still face tremendous errors when running codes if we use torch 1.13.1. After that, I tried a bunch of combinations of torch, apex, cuda, etc. Here is my environment now.
OS: Ubuntu 22.04 GPU: A6000
# Install basic packages
conda create -n env1 python=3.8 -y
conda activate env1
conda install -y ipython scipy h5py
pip install ninja yacs cython matplotlib tqdm opencv-python overrides
pip uninstall numpy
pip install numpy==1.23.0 # Require specific version
conda install -y pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -c conda-forge
# Install pycocotools
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install
# Install apex
# It is likely to encounter apex and cuda versions mismatching issue.
# Modify apex/setup.py. Find the function check_cuda_torch_binary_vs_bare_metal() and add "return" to the first line of the function body for skipping check
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext
# Install maskrcnn_benchmark
# If it shows an error about "torch._six_PY3" when running codes, you should modify maskrcnn_benchmark/utils/imports.py and change "if torch._six.PY3" to "if torch._six.PY37", or simply remove it.
git clone https://github.com/Idolized22/maskrcnn-benchmark.git
cd maskrcnn-benchmark
python setup.py build develop
I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.
It also works for me. Thx!
I am using PyTorch 1.13.1 and encounter similar THC build fails. I manually edit the cuda files according to nvidia thc fix for build fails and it works.
This solution can address THC problem. However, later on I met this problem.
maskrcnn_benchmark/csrc/cuda/deform_conv_kernel_cuda.cu(363): error: no instance of overloaded function "atomicAdd" matches the argument list argument types are: (c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half> *, c10::Half) detected during instantiation of "void deformable_col2im_gpu_kernel(int, const scalar_t *, const scalar_t *, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, scalar_t *) [with scalar_t=c10::impl::ScalarTypeToCPPTypeT<c10::ScalarType::Half>]" (386): here
I try this but it does not work on my end: https://forums.developer.nvidia.com/t/atomicadd-not-overloaded-for-c10-half/204474/5 Did you meet this problem before? Thanks!
Correct. Even if we can successfully compile this repo, it'll still face tremendous errors when running codes if we use torch 1.13.1. After that, I tried a bunch of combinations of torch, apex, cuda, etc. Here is my environment now.
OS: Ubuntu 22.04 GPU: A6000
# Install basic packages conda create -n env1 python=3.8 -y conda activate env1 conda install -y ipython scipy h5py pip install ninja yacs cython matplotlib tqdm opencv-python overrides pip uninstall numpy pip install numpy==1.23.0 # Require specific version conda install -y pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -c conda-forge # Install pycocotools git clone https://github.com/cocodataset/cocoapi.git cd cocoapi/PythonAPI python setup.py build_ext install # Install apex # It is likely to encounter apex and cuda versions mismatching issue. # Modify apex/setup.py. Find the function check_cuda_torch_binary_vs_bare_metal() and add "return" to the first line of the function body for skipping check git clone https://github.com/NVIDIA/apex.git cd apex python setup.py install --cuda_ext --cpp_ext # Install maskrcnn_benchmark # If it shows an error about "torch._six_PY3" when running codes, you should modify maskrcnn_benchmark/utils/imports.py and change "if torch._six.PY3" to "if torch._six.PY37", or simply remove it. git clone https://github.com/Idolized22/maskrcnn-benchmark.git cd maskrcnn-benchmark python setup.py build develop
Yeah.. Seems downgrade pytorch to <=1.10 is the only solution... Thank you for your reply!!
This error is showing up when I am trying to run the last python setup.py build develop command.
5 | #include <THC/THC.h> | ^
~~compilation terminated. error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1I am using pytorch 1.11, cudatoolkit 11.3 and python 3.7.13