Closed Samiepapa closed 2 years ago
For the reference, my nvidia version info is as follows. NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6
The Cuda version was set to 11.5.
$ sudo update-alternatives --config cuda
There are 10 choices for the alternative cuda (providing /usr/local/cuda).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/local/cuda-11.5 115 auto mode
From the previous issues mentioned here, I've changed the CUDA version to 11.0. I faced the different issue as follows.
$ python infer.py -cfg ../configs/icon-filter.yaml -gpu 0 -in_dir ../examples -out_dir ../results
Using /home/yongilcho/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/yongilcho/.cache/torch_extensions/voxelize_cuda/build.ninja...
Building extension module voxelize_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output voxelize_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=voxelize_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include -isystem /home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/TH -isystem /home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/test/anaconda3/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++14 -c /home/yongilcho/reposit/ICON/lib/neural_voxelization_layer/cuda/voxelize_cuda.cu -o voxelize_cuda.cuda.o
FAILED: voxelize_cuda.cuda.o
When did you git clone the code? I updated a new version 9 hours ago. For now, the voxelize_cuda is installed by pip. Please check out requirements.txt. Also, after changing the CUDA version, remember to install the suitable PyTorch compatible with such CUDA.
Thanks for your quick feedback. I will update the code and check it again. Anyway, I can find the pythorch (1.8.2) install site, https://pytorch.org/get-started/locally/, only for CUDA11.1.
pip3 install torch==1.8.2+cu111 torchvision==0.9.2+cu111 torchaudio==0.8.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
Could you know any other install command for CUDA11.0 ?
I faced one issue when installing all packages in "requirements.txt." When setting cuda version to be 11.0, I can not install the packages due to some errors.
Building wheel for bvh-distance-queries (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /home/test/.virtualenvs/icon/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-_j98v0fe/setup.py'uild-_j98v0fe/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(bdist_wheel -d /tmp/pip-wheel-a0u0j7ca
cwd: /tmp/pip-req-build-_j98v0fe/
Complete output (180 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/bvh_distance_queries
copying bvh_distance_queries/bvh_search_tree.py -> build/lib.linux-x86_64-3.8/bvh_distance_queries
copying bvh_distance_queries/__init__.py -> build/lib.linux-x86_64-3.8/bvh_distance_queries
copying bvh_distance_queries/mesh_distance.py -> build/lib.linux-x86_64-3.8/bvh_distance_queries
running build_ext
building 'bvh_distance_queries_cuda' extension
creating /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-3.8
creating /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-3.8/src
Emitting ninja build file /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-3.8/src/bvh_cualenvs/icon/lib/python3.8/site-packages/torch/include -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/torch/csrc/api/includeon/lib/python3.8/site-packages/torch/include/TH -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/THC -Iinclude -Icuda-samples/nvs/icon/lib/python3.8/site-packages/torch/include -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/ib/python3.8/site-packages/torch/include/TH -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/nclude -I/home/test/anaconda3/include/python3.8 -c -c /tmp/pip-req-build-_j98v0fe/src/bvh_cuda_op.cu -o /tmp/pip-req-build-_j98v0fe/build/temp.linux-xDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-TIMINGS=0 -DDEBUG_PRINT=0 -DERROR_CHECKING=1 -DNUM_THREADS=256 -DPROFILING=0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_UILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=bvh_distance_queries_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=
FAILED: /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-3.8/src/bvh_cuda_op.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-3.8/src/bvh_cuda_ops/icon/lib/python3.8/site-packages/torch/include -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/ho/python3.8/site-packages/torch/include/TH -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/THC -Iinclude -Icuda-samples/Commonon/lib/python3.8/site-packages/torch/include -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/yhon3.8/site-packages/torch/include/TH -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/y -I/home/test/anaconda3/include/python3.8 -c -c /tmp/pip-req-build-_j98v0fe/src/bvh_cuda_op.cu -o /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-optionS=0 -DDEBUG_PRINT=0 -DERROR_CHECKING=1 -DNUM_THREADS=256 -DPROFILING=0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIBBI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=bvh_distance_queries_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=comput
nvcc fatal : Unsupported gpu architecture 'compute_86'
Finally, I succeeded to install all packages in requirements.txt with Cuda version 11.5. After installing all packages, I can see the same issue mentioned first even though I changed the cuda version 11.0.
Traceback (most recent call last):
File "infer.py", line 310, in <module>
verts_pr, faces_pr, _ = model.test_single(in_tensor)
File "/home/test/reposit/ICON/apps/ICON.py", line 738, in test_single
sdf = self.reconEngine(opt=self.cfg,
File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "../lib/common/seg3d_lossless.py", line 148, in forward
return self._forward_faster(**kwargs)
File "../lib/common/seg3d_lossless.py", line 170, in _forward_faster
occupancys = self.batch_eval(coords, **kwargs)
File "../lib/common/seg3d_lossless.py", line 139, in batch_eval
occupancys = self.query_func(**kwargs, points=coords2D)
File "../lib/common/train_util.py", line 338, in query_func
preds = netG.query(features=features,
File "../lib/net/HGPIFuNet.py", line 285, in query
smpl_sdf, smpl_norm, smpl_cmap, smpl_ind = cal_sdf_batch(
File "../lib/dataset/mesh_util.py", line 255, in cal_sdf_batch
residues, normals, pts_cmap, pts_ind = func(
File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/bvh_distance_queries/mesh_distance.py", line 79, in forward
output = self.search_tree(triangles, points)
File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/bvh_distance_queries/bvh_search_tree.py", line 109, in forward
output = BVHFunction.apply(
File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/bvh_distance_queries/bvh_search_tree.py", line 42, in forward
outputs = bvh_distance_queries_cuda.distance_queries(
RuntimeError: after reduction step 1: cudaErrorInvalidDevice: invalid device ordinal
Thanks for your quick feedback. I will update the code and check it again. Anyway, I can find the pythorch (1.8.2) install site, https://pytorch.org/get-started/locally/, only for CUDA11.1.
pip3 install torch==1.8.2+cu111 torchvision==0.9.2+cu111 torchaudio==0.8.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
Could you know any other install command for CUDA11.0 ?
cudatookit==11.1 can works well on cuda 11.0, no worry
I faced one issue when installing all packages in "requirements.txt." When setting cuda version to be 11.0, I can not install the packages due to some errors.
Building wheel for bvh-distance-queries (setup.py) ... error ERROR: Command errored out with exit status 1: command: /home/test/.virtualenvs/icon/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-_j98v0fe/setup.py'uild-_j98v0fe/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(bdist_wheel -d /tmp/pip-wheel-a0u0j7ca cwd: /tmp/pip-req-build-_j98v0fe/ Complete output (180 lines): running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/bvh_distance_queries copying bvh_distance_queries/bvh_search_tree.py -> build/lib.linux-x86_64-3.8/bvh_distance_queries copying bvh_distance_queries/__init__.py -> build/lib.linux-x86_64-3.8/bvh_distance_queries copying bvh_distance_queries/mesh_distance.py -> build/lib.linux-x86_64-3.8/bvh_distance_queries running build_ext building 'bvh_distance_queries_cuda' extension creating /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-3.8 creating /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-3.8/src Emitting ninja build file /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-3.8/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-3.8/src/bvh_cualenvs/icon/lib/python3.8/site-packages/torch/include -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/torch/csrc/api/includeon/lib/python3.8/site-packages/torch/include/TH -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/THC -Iinclude -Icuda-samples/nvs/icon/lib/python3.8/site-packages/torch/include -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/ib/python3.8/site-packages/torch/include/TH -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/nclude -I/home/test/anaconda3/include/python3.8 -c -c /tmp/pip-req-build-_j98v0fe/src/bvh_cuda_op.cu -o /tmp/pip-req-build-_j98v0fe/build/temp.linux-xDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-TIMINGS=0 -DDEBUG_PRINT=0 -DERROR_CHECKING=1 -DNUM_THREADS=256 -DPROFILING=0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_UILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=bvh_distance_queries_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch= FAILED: /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-3.8/src/bvh_cuda_op.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-3.8/src/bvh_cuda_ops/icon/lib/python3.8/site-packages/torch/include -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/ho/python3.8/site-packages/torch/include/TH -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/THC -Iinclude -Icuda-samples/Commonon/lib/python3.8/site-packages/torch/include -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/yhon3.8/site-packages/torch/include/TH -I/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/y -I/home/test/anaconda3/include/python3.8 -c -c /tmp/pip-req-build-_j98v0fe/src/bvh_cuda_op.cu -o /tmp/pip-req-build-_j98v0fe/build/temp.linux-x86_64-HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-optionS=0 -DDEBUG_PRINT=0 -DERROR_CHECKING=1 -DNUM_THREADS=256 -DPROFILING=0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIBBI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=bvh_distance_queries_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=comput nvcc fatal : Unsupported gpu architecture 'compute_86'
Finally, I succeeded to install all packages in requirements.txt with Cuda version 11.5. After installing all packages, I can see the same issue mentioned first even though I changed the cuda version 11.0.
Traceback (most recent call last): File "infer.py", line 310, in <module> verts_pr, faces_pr, _ = model.test_single(in_tensor) File "/home/test/reposit/ICON/apps/ICON.py", line 738, in test_single sdf = self.reconEngine(opt=self.cfg, File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "../lib/common/seg3d_lossless.py", line 148, in forward return self._forward_faster(**kwargs) File "../lib/common/seg3d_lossless.py", line 170, in _forward_faster occupancys = self.batch_eval(coords, **kwargs) File "../lib/common/seg3d_lossless.py", line 139, in batch_eval occupancys = self.query_func(**kwargs, points=coords2D) File "../lib/common/train_util.py", line 338, in query_func preds = netG.query(features=features, File "../lib/net/HGPIFuNet.py", line 285, in query smpl_sdf, smpl_norm, smpl_cmap, smpl_ind = cal_sdf_batch( File "../lib/dataset/mesh_util.py", line 255, in cal_sdf_batch residues, normals, pts_cmap, pts_ind = func( File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/bvh_distance_queries/mesh_distance.py", line 79, in forward output = self.search_tree(triangles, points) File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/bvh_distance_queries/bvh_search_tree.py", line 109, in forward output = BVHFunction.apply( File "/home/test/.virtualenvs/icon/lib/python3.8/site-packages/bvh_distance_queries/bvh_search_tree.py", line 42, in forward outputs = bvh_distance_queries_cuda.distance_queries( RuntimeError: after reduction step 1: cudaErrorInvalidDevice: invalid device ordinal
Probably, you could have a loot at
</a>
I showed the full process to set it up in Ubuntu with anaconda.
Current BVH only supports CUDA<=11.0, this is a version modified from torch-mesh-isect, I have no idea when Vassilis will update it to support the latest CUDA.
Okay. Thanks a lot. I will check the Colab setting. BTW, Is it limited to Ubuntu18.04 like Colab ?
Okay. Thanks a lot. I will check the Colab setting. BTW, Is it limited to Ubuntu18.04 like Colab ?
Nope, it works well on Ubuntu 20.04.
getting the same issue with various versions of Pytorch and Cuda 11.0
bvh samples seem to run for the most part (some other issues with Kornia arise)
@jaymefosa
Here is the PyTorch version I used for ICON
Please re-install (uninstall, install) the bvh or PyTorch3D if you changed the version of PyTorch, because these libs are dependent on PyTorch.
Another hint to solve the CUDA version issue is to define the environmental variable TORCH_CUDA_ARCH_LIST="8.0" (in .bashrc for example) before installing pytorch-related packages
cuda 11.0 requires arch 8.0 (https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/) but python setuptools can give arch 8.6 as default if using cuda 11.1 breaking things between pytorch3d, bvh, etc. which is that annoying 'invalid device ordinal'
@jaymefosa @Samiepapa, I have replaced the bvh-distance-queries
with PyTorch3D+Kaolin
, thus you don't need to install it anymore. Also, the CUDA version is not limited to 11.0.
After installing all packages, I got the results successfully for PIFu and PaMIR. I faced the runtime error when trying to get the ICON demo result. Could you guide what setting was wrong?