ashawkey / dreamfields-torch

A pytorch implementation of dreamfields with modifications.
MIT License
140 stars 54 forks source link

RuntimeError: Error building extension '_hash_encoder_df' #1

Open entangledothers opened 2 years ago

entangledothers commented 2 years ago

Have tried various env setups but get stuck with the following error when running this command: OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=1 python main_nerf.py "cthulhu" --workspace trial --cuda_ray --fp16 --gui

Using /home/user/.cache/torch_extensions/py37_cu113 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/user/.cache/torch_extensions/py37_cu113/_hash_encoder_df/build.ninja...
Building extension module _hash_encoder_df...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF bindings.o.d -DTORCH_EXTENSION_NAME=_hash_encoder_df -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/TH -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/THC -isystem /home/user/anaconda3/envs/dreamfields/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -c /home/user/test/dreamfields-torch/hashencoder/src/bindings.cpp -o bindings.o 
[2/3] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder_df -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/TH -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/THC -isystem /home/user/anaconda3/envs/dreamfields/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -c /home/user/test/dreamfields-torch/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
FAILED: hashencoder.cuda.o 
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder_df -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/TH -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/THC -isystem /home/user/anaconda3/envs/dreamfields/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -c /home/user/test/dreamfields-torch/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
/usr/include/c++/10/chrono: In substitution of ‘template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’:
/usr/include/c++/10/chrono:473:154:   required from here
/usr/include/c++/10/chrono:428:27: internal compiler error: Segmentation fault
  428 |  _S_gcd(intmax_t __m, intmax_t __n) noexcept
      |                           ^~~~~~
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1723, in _run_ninja_build
    env=env)
  File "/home/user/anaconda3/envs/dreamfields/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "main_nerf.py", line 57, in <module>
    from nerf.network import NeRFNetwork
  File "/home/user/test/dreamfields-torch/nerf/network.py", line 5, in <module>
    from encoding import get_encoder
  File "/home/user/test/dreamfields-torch/encoding.py", line 6, in <module>
    from hashencoder import HashEncoder
  File "/home/user/test/dreamfields-torch/hashencoder/__init__.py", line 1, in <module>
    from .hashgrid import HashEncoder
  File "/home/user/test/dreamfields-torch/hashencoder/hashgrid.py", line 9, in <module>
    from .backend import _backend
  File "/home/user/test/dreamfields-torch/hashencoder/backend.py", line 12, in <module>
    sources=[os.path.join(_src_path, 'src', f) for f in [
  File "/home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1136, in load
    keep_intermediates=keep_intermediates)
  File "/home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
    is_standalone=is_standalone)
  File "/home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1452, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_hash_encoder_df'

Only thing changed in repo is having added verbose output to hashencoder. Also makes no difference with the tiny-cuda-nn installed or not.

ashawkey commented 2 years ago
  1. Could you provide more details about the environment, such as the platform, GPU arch, and CUDA version?
  2. If you can successfully compile tiny-cuda-nn, you could comment out all HashEncoder (e.g., at /home/user/test/dreamfields-torch/encoding.py) and use the hash encoder of tiny-cuda-nn, by adding the --tcnn flag.
entangledothers commented 2 years ago

Of course!

  1. POP OS (Ubuntu 21.04), A6000 & RTX 6000, CUDA 11.4.
  2. Commented out lines 6, 65 & 66 (let me know if there were any further parts that need commenting out). The process now breaks on raymarching:

´´´ OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=0 python main_nerf.py "cthulhu" --workspace trial --cuda_ray --fp16 --tcnn --gui

Namespace(H=800, W=800, aug_copy=8, bound=1, cuda_ray=True, dir_text=False, ff=False, fovy=90, fp16=True, gui=True, h=128, max_ray_batch=4096, max_spp=64, num_rays=4096, num_steps=128, radius=3, seed=0, tau_0=0.5, tau_1=0.8, tau_step=500, tcnn=True, test=False, text='cthulhu', upsample_steps=128, w=128, workspace='trial') Traceback (most recent call last): File "/home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1723, in _run_ninja_build env=env) File "/home/user/anaconda3/envs/dreamfields/lib/python3.7/subprocess.py", line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "main_nerf.py", line 55, in from nerf.network_tcnn import NeRFNetwork File "/home/user/dreamfields-torch/nerf/network_tcnn.py", line 6, in from .renderer import NeRFRenderer File "/home/user/dreamfields-torch/nerf/renderer.py", line 9, in import raymarching File "/home/user/dreamfields-torch/raymarching/init.py", line 1, in from .raymarching import File "/home/user/dreamfields-torch/raymarching/raymarching.py", line 9, in from .backend import _backend File "/home/user/dreamfields-torch/raymarching/backend.py", line 9, in sources=[os.path.join(_src_path, 'src', f) for f in [ File "/home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1136, in load keep_intermediates=keep_intermediates) File "/home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile is_standalone=is_standalone) File "/home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1452, in _write_ninja_file_and_build_library error_prefix=f"Error building extension '{name}'") File "/home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension '_raymarching_df': [1/3] c++ -MMD -MF bindings.o.d -DTORCH_EXTENSION_NAME=_raymarching_df -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/TH -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/THC -isystem /home/user/anaconda3/envs/dreamfields/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -c /home/user/dreamfields-torch/raymarching/src/bindings.cpp -o bindings.o [2/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=_raymarching_df -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/TH -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/THC -isystem /home/user/anaconda3/envs/dreamfields/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -std=c++14 -c /home/user/dreamfields-torch/raymarching/src/raymarching.cu -o raymarching.cuda.o FAILED: raymarching.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=_raymarching_df -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/TH -isystem /home/user/anaconda3/envs/dreamfields/lib/python3.7/site-packages/torch/include/THC -isystem /home/user/anaconda3/envs/dreamfields/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -std=c++14 -c /home/user/dreamfields-torch/raymarching/src/raymarching.cu -o raymarching.cuda.o /home/user/dreamfields-torch/raymarching/src/raymarching.cu(271): warning: variable "d" was declared but never referenced detected during instantiation of "void kernel_composite_rays_train_forward(const scalar_t , const scalar_t , const scalar_t , const int , float, uint32_t, uint32_t, scalar_t , scalar_t *) [with scalar_t=double]" (444): here

/home/user/dreamfields-torch/raymarching/src/raymarching.cu(271): warning: variable "d" was declared but never referenced detected during instantiation of "void kernel_composite_rays_train_forward(const scalar_t , const scalar_t , const scalar_t , const int , float, uint32_t, uint32_t, scalar_t , scalar_t ) [with scalar_t=float]" (444): here

/home/user/dreamfields-torch/raymarching/src/raymarching.cu(535): warning: variable "near" was declared but never referenced detected during instantiation of "void kernel_march_rays(uint32_t, uint32_t, const int , const scalar_t , const scalar_t , const scalar_t , float, uint32_t, const scalar_t , float, const scalar_t , const scalar_t , scalar_t , scalar_t , scalar_t , uint32_t) [with scalar_t=float]" (606): here

/usr/include/c++/10/chrono: In substitution of ‘template<class _Rep, class _Period> template using is_harmonic = std::bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’: /usr/include/c++/10/chrono:473:154: required from here /usr/include/c++/10/chrono:428:27: internal compiler error: Segmentation fault 428 | _S_gcd(intmax_t m, intmax_t n) noexcept | ^~ Please submit a full bug report, with preprocessed source if appropriate. See file:///usr/share/doc/gcc-10/README.Bugs for instructions. ninja: build stopped: subcommand failed. ´´´

ashawkey commented 2 years ago

@entangledothers It seems to be caused by the gcc version according to this issue. Could you try with a lower gcc version, such as gcc-9?

entangledothers commented 2 years ago

@entangledothers It seems to be caused by the gcc version according to this issue. Could you try with a lower gcc version, such as gcc-9?

Sadly, using gcc-9 (and even 8) made no difference, same issue as above.

ashawkey commented 2 years ago

Sorry for the late reply! A major updation has been pushed, you can try again to see if anything changes.