Closed jiayangshi closed 9 months ago
Hi,
Thanks for your interest. It seems that the issue is due to missing cuda_runtime
. I suggest using the precompiled CUDA in the pytorch package instead of the locally installed one.
I have updated the setup instruction in README.md
. Please have a try to see if it works for you.
Ruyi
Hi,
Thank you for your reply. Tried again and still couldn't sort it out, do you know how can point to use precompiled CUDA in the pytorch package?
After installation followed by your README.md
, I tried to run python train.py --config ./config/chest_50.yaml
. And it reports
Traceback (most recent call last):
File "/home/shij3/naf_cbct/train.py", line 10, in <module>
from src.trainer import Trainer
File "/home/shij3/naf_cbct/src/trainer.py", line 12, in <module>
from .encoder import get_encoder
File "/home/shij3/naf_cbct/src/encoder/__init__.py", line 1, in <module>
from .hashencoder import HashEncoder
File "/home/shij3/naf_cbct/src/encoder/hashencoder/__init__.py", line 1, in <module>
from .hashgrid import HashEncoder
File "/home/shij3/naf_cbct/src/encoder/hashencoder/hashgrid.py", line 8, in <module>
from .backend import _backend
File "/home/shij3/naf_cbct/src/encoder/hashencoder/backend.py", line 6, in <module>
_backend = load(name='_hash_encoder',
File "/home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1144, in load
return _jit_compile(
File "/home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1357, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1446, in _write_ninja_file_and_build_library
extra_ldflags = _prepare_ldflags(
File "/home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1554, in _prepare_ldflags
extra_ldflags.append(f'-L{_join_cuda_home("lib64")}')
File "/home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2058, in _join_cuda_home
raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
Because as you mentioned, we should use cuda coming along with pytorch, I set the environment variable to use from condo environment with export CUDA_HOME=$CONDA_PREFIX
. And then the error is nvcc not found:
Traceback (most recent call last):
File "/home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1740, in _run_ninja_build
subprocess.run(
File "/home/shij3/anaconda3/envs/naf_test/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/shij3/naf_cbct/train.py", line 10, in <module>
from src.trainer import Trainer
File "/home/shij3/naf_cbct/src/trainer.py", line 12, in <module>
from .encoder import get_encoder
File "/home/shij3/naf_cbct/src/encoder/__init__.py", line 1, in <module>
from .hashencoder import HashEncoder
File "/home/shij3/naf_cbct/src/encoder/hashencoder/__init__.py", line 1, in <module>
from .hashgrid import HashEncoder
File "/home/shij3/naf_cbct/src/encoder/hashencoder/hashgrid.py", line 8, in <module>
from .backend import _backend
File "/home/shij3/naf_cbct/src/encoder/hashencoder/backend.py", line 6, in <module>
_backend = load(name='_hash_encoder',
File "/home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1144, in load
return _jit_compile(
File "/home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1357, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1469, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension '_hash_encoder': [1/3] /home/shij3/anaconda3/envs/naf_test/bin/nvcc -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/include -isystem /home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/include/TH -isystem /home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/include/THC -isystem /home/shij3/anaconda3/envs/naf_test/include -isystem /home/shij3/anaconda3/envs/naf_test/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /home/shij3/naf_cbct/src/encoder/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o
FAILED: hashencoder.cuda.o
/home/shij3/anaconda3/envs/naf_test/bin/nvcc -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/include -isystem /home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/include/TH -isystem /home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/include/THC -isystem /home/shij3/anaconda3/envs/naf_test/include -isystem /home/shij3/anaconda3/envs/naf_test/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /home/shij3/naf_cbct/src/encoder/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o
/bin/sh: 1: /home/shij3/anaconda3/envs/naf_test/bin/nvcc: not found
[2/3] c++ -MMD -MF bindings.o.d -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/include -isystem /home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/include/TH -isystem /home/shij3/anaconda3/envs/naf_test/lib/python3.9/site-packages/torch/include/THC -isystem /home/shij3/anaconda3/envs/naf_test/include -isystem /home/shij3/anaconda3/envs/naf_test/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /home/shij3/naf_cbct/src/encoder/hashencoder/src/bindings.cpp -o bindings.o
ninja: build stopped: subcommand failed.
Here comes in my original try from my original question to install nvcc
from conda with conda install -c "nvidia/label/cuda-11.3.0" cuda-nvcc
. But do you mean we can actually use nvcc
coming along with the installed pytorch
, how can point to use precompiled CUDA in the pytorch package?
Hi, yes we use nvcc
/cuda
coming along with the installed pytorch. Pytorch should automatically point to the precompiled cuda
if it is corrected installed. I didn't manually specify the variable for cuda
. I tried my code on different 30-series GPU systems (even the one without locally installed CUDA) and they all worked fine.
I suggest cleaning all installed nvcc
/cuda
in your system and conda environment. Then follow README.md
to create and setup the new environment. Note that nvcc
/cuda
is already included in the pytorch command pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
. You do not need to install it again with conda command conda install -c "nvidia/label/cuda-11.3.0" cuda-nvcc
. Hope this help.
Ruyi
Thank you for your great work. When I tried to run
python train.py --config ./config/chest_50.yaml
, I encountered this error messageThe nvcc was installed through conda
conda install -c "nvidia/label/cuda-11.3.0" cuda-nvcc
and environment variable was set withexport CUDA_HOME=$CONDA_PREFIX
.nvcc --version
showsI would like to ask if you know potential solution to this ? Did you use locally installed nvcc? Is it possible to use the nvcc in conda? Thank you.