Open Larescool opened 1 year ago
Can you explain more about your set up? There is no release of pytorch with cuda 12.1. Did you build pytorch from source? If so, then if you build pytorch3d from source in the same environment then things should work.
On Arch Linux, the latest packages are:
...Although, I personally use a non-system install of torch
inside a virtual environment, which seems to work with my OS without any further changes. pip install torch
seems to download the wheel torch-2.0.0-cp311-cp311-manylinux1_x86_64.whl
, and everything seems to work without rebuilding specifically for CUDA 12.1. No error when running PyTorch code. Perhaps this is because torch installs the dependency nvidia-cuda-runtime-cu11
. This suggests that the system CUDA runtime isn't being used here.
Installing PyTorch 3D via git still leads to this error:
$ pip install "git+https://github.com/facebookresearch/pytorch3d.git"
Building wheels for collected packages: pytorch3d
Building wheel for pytorch3d (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [290 lines of output]
/tmp/pip-req-build-ow6c0gf_/setup.py:84: UserWarning: The environment variable `CUB_HOME` was not found. NVIDIA CUB is required for compilation and can be downloaded from `https://github.com/NVIDIA/cub/releases`. You can unpack it to a location of your choice and set the environment variable `CUB_HOME` to the folder containing the `CMakeListst.txt` file.
warnings.warn(
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-311
creating build/lib.linux-x86_64-cpython-311/pytorch3d
[...]
copying pytorch3d/datasets/shapenet/shapenet_synset_dict_v1.json -> build/lib.linux-x86_64-cpython-311/pytorch3d/datasets/shapenet
copying pytorch3d/datasets/r2n2/r2n2_synset_dict.json -> build/lib.linux-x86_64-cpython-311/pytorch3d/datasets/r2n2
running build_ext
Traceback (most recent call last):
[...]
File "/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions
_check_cuda_version(compiler_name, compiler_version)
File "/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 386, in _check_cuda_version
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError:
The detected CUDA version (12.1) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.
From what I can tell, this is because of PyTorch being happy with the non-system CUDA 11.7 runtime but unhappy with CUDA 12.1 being used for compilation. Thus, it's not really PyTorch 3D's fault. Possible workarounds:
CUDA_HOME=/path/to/cuda-11.7
.torch.utils.cpp_extension._check_cuda_version
to ignore this error.torch
with CUDA 12.1.Using the system site-packages version of PyTorch 2.0.0 with CUDA 12.1, I get other errors when building PyTorch 3D from git:
$ pip install "git+https://github.com/facebookresearch/pytorch3d.git"
[...]
running build_ext
/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/lib/python3.11/site-packages/torch/utils/cpp_extension.py:398: UserWarning: There are no g++ version bounds defined for CUDA version 12.1
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building 'pytorch3d._C' extension
creating /tmp/pip-req-build-06jphomw/build/temp.linux-x86_64-cpython-311
creating /tmp/pip-req-build-06jphomw/build/temp.linux-x86_64-cpython-311/tmp
creating /tmp/pip-req-build-06jphomw/build/temp.linux-x86_64-cpython-311/tmp/pip-req-build-06jphomw
creating /tmp/pip-req-build-06jphomw/build/temp.linux-x86_64-cpython-311/tmp/pip-req-build-06jphomw/pytorch3d
creating /tmp/pip-req-build-06jphomw/build/temp.linux-x86_64-cpython-311/tmp/pip-req-build-06jphomw/pytorch3d/csrc
creating /tmp/pip-req-build-06jphomw/build/temp.linux-x86_64-cpython-311/tmp/pip-req-build-06jphomw/pytorch3d/csrc/ball_query
[...]
Emitting ninja build file /tmp/pip-req-build-06jphomw/build/temp.linux-x86_64-cpython-311/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/67] /opt/cuda/bin/nvcc -DWITH_CUDA -DTHRUST_IGNORE_CUB_VERSION_CHECK -I/tmp/pip-req-build-06jphomw/pytorch3d/csrc -I/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/lib/python3.11/site-packages/torch/include -I/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/lib/python3.11/site-packages/torch/include/TH -I/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/lib/python3.11/site-packages/torch/include/THC -I/opt/cuda/include -I/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/include -I/usr/include/python3.11 -c -c /tmp/pip-req-build-06jphomw/pytorch3d/csrc/ball_query/ball_query.cu -o /tmp/pip-req-build-06jphomw/build/temp.linux-x86_64-cpython-311/tmp/pip-req-build-06jphomw/pytorch3d/csrc/ball_query/ball_query.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61
FAILED: /tmp/pip-req-build-06jphomw/build/temp.linux-x86_64-cpython-311/tmp/pip-req-build-06jphomw/pytorch3d/csrc/ball_query/ball_query.o
/opt/cuda/bin/nvcc -DWITH_CUDA -DTHRUST_IGNORE_CUB_VERSION_CHECK -I/tmp/pip-req-build-06jphomw/pytorch3d/csrc -I/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/lib/python3.11/site-packages/torch/include -I/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/lib/python3.11/site-packages/torch/include/TH -I/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/lib/python3.11/site-packages/torch/include/THC -I/opt/cuda/include -I/home/mulhaq/.cache/pypoetry/virtualenvs/compressai-trainer-KZXCvdxM-py3.11/include -I/usr/include/python3.11 -c -c /tmp/pip-req-build-06jphomw/pytorch3d/csrc/ball_query/ball_query.cu -o /tmp/pip-req-build-06jphomw/build/temp.linux-x86_64-cpython-311/tmp/pip-req-build-06jphomw/pytorch3d/csrc/ball_query/ball_query.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61
/usr/include/stdlib.h(141): error: identifier "_Float32" is undefined
extern _Float32 strtof32 (const char *__restrict __nptr,
^
/usr/include/stdlib.h(147): error: identifier "_Float64" is undefined
extern _Float64 strtof64 (const char *__restrict __nptr,
^
/usr/include/stdlib.h(153): error: identifier "_Float128" is undefined
extern _Float128 strtof128 (const char *__restrict __nptr,
^
That's a bunch of missing types, which suggests the warning may be relevant.
UserWarning: There are no g++ version bounds defined for CUDA version 12.1
$ g++ --version | head -n 1
g++ (GCC) 13.1.1 20230429
But the max g++ version for CUDA 12.0 is g++ 12.1. (Not sure about CUDA 12.1.) So presumably, a downgrade of g++ may help...
Luckily, I have an older Python 3.10 virtual environment with PyTorch 3D installed, so I might just use that instead of going further down the rabbit hole...
I set different version of CUDA to tackle this problem.
I encountered this issue when building PyTorch myself without conda
on Arch Linux and saw the same error.
a downgrade of g++ may help...
As suggested by YodaEmbedding, I also think setting older versions of CC and CXX may fix this problem:
$ export CC=/usr/bin/gcc-11
$ export CXX=/usr/bin/g++-12
$ python setup.py build
I succeeded in building myself on Arch Linux with the above hack.
Here's what I did to get things working on Arch Linux:
# Install PyTorch 1.13.1:
pip install --force-reinstall torch==1.13.1 torchvision==0.14.1
# Install CUDA 11.7:
paru -S cuda-11.7
# Install gcc10:
gpg --recv-keys 6C35B99309B5FA62 # expired keys from <2019 for gcc10
paru -S gcc10 # --chroot (optional, but may fix some issues)
# Download CUB:
(cd /tmp/ &&
wget https://github.com/NVIDIA/cub/archive/refs/tags/2.1.0.tar.gz -O cub-2.1.0.tar.gz &&
tar xf cub-2.1.0.tar.gz
)
export CUDA_HOME=/opt/cuda-11.7
export CC=/usr/bin/gcc-10
export CXX=/usr/bin/g++-10
export CUB_HOME=/tmp/cub-2.1.0
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
Note that building (compiling+testing) gcc10 took around 10 hours on my i5 6500.
Also, I wrote the cuda-11.7 PKGBUILD based on the cuda-11.1 PKGBUILD from AUR, so there may (or may not) be issues with it.
I had the same issue and found another solution.
When installing from a local clone, before running pip install -e .
, go into the setup.py file in the pytorch3d dir and replace c++14 with c++17 in line 52 (extra_compile_args = {"cxx": ["-std=c++17"]}
) and line 77 (nvcc_args.append("-std=c++17")
).
Running pip will now compile everything using c++17. I tried a few functions and could not find any unwanted behavior, though ymmv.
I had the same issue and found another solution.
When installing from a local clone, before running
pip install -e .
, go into the setup.py file in the pytorch3d dir and replace c++14 with c++17 in line 52 (extra_compile_args = {"cxx": ["-std=c++17"]}
) and line 77 (nvcc_args.append("-std=c++17")
).Running pip will now compile everything using c++17. I tried a few functions and could not find any unwanted behavior, though ymmv.
Thanks @DKatz96. I confirm that the parameter c++17
is now set by default in a local clone. Therefore working with the installation instructions given by pytorch3d. The following solved the problem in my case.
git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d && pip install -e .
This worked for me on ubuntu with cuda 12.1 everything:
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
You can try my repository for building packages and PyPI simple index and see if it works for you: https://github.com/facebookresearch/pytorch3d/discussions/1752
When setup pytorch3d-0.7.3, I met up with this:
The detected CUDA version (12.1) mismatches the version that was used to compile PyTorch (11.8). Please make sure to use the same CUDA versions.
Is there any solutions for newest CUDA version (12.1) ?