Closed wangjksjtu closed 2 years ago
What's your GPU hardware architecture?
Currently the code uses atomicAdd
for __half
, which is only available for a GPU with architecture >= 70.
A temporary solution is to comment out that function here and its use here, and make sure level_dim
is even (but a minimal architecture of 60 is still needed for __half2
).
I met a similar error.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train_nerf.py", line 3, in <module>
from nerf.network import NeRFNetwork
File "/data/new_disk70/wangla/tmp/torch-ngp/nerf/network.py", line 9, in <module>
from encoding import get_encoder
File "/data/new_disk70/wangla/tmp/torch-ngp/encoding.py", line 6, in <module>
from hashencoder import HashEncoder
File "/data/new_disk70/wangla/tmp/torch-ngp/hashencoder/__init__.py", line 1, in <module>
from .hashgrid import HashEncoder
File "/data/new_disk70/wangla/tmp/torch-ngp/hashencoder/hashgrid.py", line 8, in <module>
from .backend import _backend
File "/data/new_disk70/wangla/tmp/torch-ngp/hashencoder/backend.py", line 6, in <module>
_backend = load(name='_hash_encoder',
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1124, in load
return _jit_compile(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1337, in _jit_compile
_write_ninja_file_and_build_library(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1449, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension '_hash_encoder': [1/3] :/usr/local/cuda-11.3/bin/nvcc -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem :/usr/local/cuda-11.3/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /data/new_disk70/wangla/tmp/torch-ngp/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o
FAILED: hashencoder.cuda.o
:/usr/local/cuda-11.3/bin/nvcc -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem :/usr/local/cuda-11.3/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /data/new_disk70/wangla/tmp/torch-ngp/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o
/bin/sh: 1: :/usr/local/cuda-11.3/bin/nvcc: not found
[2/3] c++ -MMD -MF bindings.o.d -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem :/usr/local/cuda-11.3/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /data/new_disk70/wangla/tmp/torch-ngp/hashencoder/src/bindings.cpp -o bindings.o
ninja: build stopped: subcommand failed.
even l comment out that 2 lines, still the same error occurs.
More info:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.10.2+cu113'
I am using RTX3090.
@aoliao12138 The error message says /bin/sh: 1: :/usr/local/cuda-11.3/bin/nvcc: not found
, have you included CUDA bin to your path? (e.g., export PATH="/usr/local/cuda/bin:$PATH"
)
@ashawkey Thank you for the prompt reply!
My GPU is RTX 1080Ti - so the architecture is 61. It seems to work for me when comment that atomicAdd
function. However, the following issues (compilation of fully fused network) appear:
File "train_nerf.py", line 4, in <module>
from nerf.network_ff import NeRFNetwork as NeRFNetwork_FF
File "/home/wangjk/programs/torch-ngp/nerf/network_ff.py", line 10, in <module>
from ffmlp import FFMLP
File "/home/wangjk/programs/torch-ngp/ffmlp/__init__.py", line 1, in <module>
from .ffmlp import FFMLP
File "/home/wangjk/programs/torch-ngp/ffmlp/ffmlp.py", line 10, in <module>
from .backend import _backend
File "/home/wangjk/programs/torch-ngp/ffmlp/backend.py", line 16, in <module>
sources=[os.path.join(_src_path, 'src', f) for f in [
File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1136, in load
keep_intermediates=keep_intermediates)
File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
is_standalone=is_standalone)
File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1452, in _write_ninja_file_and_build_library
error_prefix=f"Error building extension '{name}'")
File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension '_ffmlp': [1/2] /home/wangjk/anaconda3/envs/torch-ngp/bin/nvcc -DTORCH_EXTENSION_NAME=_ffmlp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/wangjk/programs/torch-ngp/ffmlp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/torch-ngp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -Xcompiler=-mf16c -Xcompiler=-Wno-float-conversion -Xcompiler=-fno-strict-aliasing --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu -o ffmlp.cuda.o
FAILED: ffmlp.cuda.o
/home/wangjk/anaconda3/envs/torch-ngp/bin/nvcc -DTORCH_EXTENSION_NAME=_ffmlp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/wangjk/programs/torch-ngp/ffmlp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/torch-ngp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -Xcompiler=-mf16c -Xcompiler=-Wno-float-conversion -Xcompiler=-fno-strict-aliasing --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu -o ffmlp.cuda.o
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(243): error: explicit type is missing ("int" assumed)
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(243): error: expected a ")"
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(344): error: explicit type is missing ("int" assumed)
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(344): error: expected a ")"
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(577): error: name followed by "::" must be a class or namespace name
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(396): error: identifier "output_layout" is undefined
detected during instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]"
(655): here
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(396): error: name followed by "::" must be a class or namespace name
detected during instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]"
(655): here
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(60): error: name must be a namespace name
detected during:
instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]"
(564): here
instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]"
(655): here
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(64): error: identifier "wmma" is undefined
detected during:
instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]"
(564): here
instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]"
(655): here
....
....
85 errors detected in the compilation of "/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu".
ninja: build stopped: subcommand failed.
Full log here:
yeah, that is what I am doing now! However, I cannot obtain decent performance. Any thoughts? see issue https://github.com/ashawkey/torch-ngp/issues/5
@wangjksjtu thanks for spotting the bug, I have fixed it!
@ashawkey Thanks for your reply! I solved it.
Closed for now.
Thanks for the nice work! I met the following issue when I run
python train_nerf.py data/fox --workspace trial_nerf
. Do you have any thoughts? Many thanks for your help!More info: