QianyiWu / objectsdf_plus

:first_quarter_moon: [ICCV'23] Pytorch implementation of "ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces"
https://qianyiwu.github.io/objectsdf++
MIT License
143 stars 5 forks source link

Unable to Compile hash encoder #13

Open wongsinglam opened 2 weeks ago

wongsinglam commented 2 weeks ago

Hi,

I suffer from a very strange problem which is related to hash_encoder

Traceback (most recent call last): File "/home//projects/objectsdf_plus/code/training/exp_runner.py", line 62, in trainrunner = ObjectSDFPlusTrainRunner(conf=opt.conf, File "/home//projects/objectsdf_plus/code/../code/training/objectsdfplus_train.py", line 112, in init self.model = utils.get_class(self.conf.get_string('train.model_class'))(conf=conf_model) File "/home//projects/objectsdf_plus/code/../code/utils/general.py", line 17, in get_class m = import(module) File "/home//projects/objectsdf_plus/code/../code/model/network.py", line 172, in from hashencoder.hashgrid import HashEncoder File "/home//projects/objectsdf_plus/code/../code/hashencoder/init.py", line 1, in from .hashgrid import HashEncoder File "/home//projects/objectsdf_plus/code/../code/hashencoder/hashgrid.py", line 12, in from .backend import _backend File "/home//projects/objectsdf_plus/code/../code/hashencoder/backend.py", line 10, in _backend = load(name='_hash_encoder', File "/mnt/sfs_turbo//miniconda3/envs/objectpp/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/mnt/sfs_turbo//miniconda3/envs/objectpp/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile _write_ninja_file_and_build_library( File "/mnt/sfs_turbo//miniconda3/envs/objectpp/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library _run_ninja_build( File "/mnt/sfs_turbo/*/miniconda3/envs/objectpp/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension '_hash_encoder'

Environment:

cuda toolkit 11.7.0 and 11.7.1 has been already tried. But I am using cuda-toolkit from the channel nvidia in conda. Not sure how it related to my problem. The repository objsdf works great in my machine!

QianyiWu commented 2 weeks ago

Hi,

Thanks for you interest in our work. I don't have any idea of the compile bug from this log. Would you mind providing more information of the log and the spec about the environment and system?

You can also refer to the issue of here to find a solution if possible.

wongsinglam commented 3 days ago
[3/3] c++ hashencoder.cuda.o bindings.o -shared -L/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home//Applications/miniconda3/envs/gsrec/lib64 -lcudart -o _hash_encoder.so
FAILED: _hash_encoder.so 
c++ hashencoder.cuda.o bindings.o -shared -L/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home//Applications/miniconda3/envs/gsrec/lib64 -lcudart -o _hash_encoder.so
/usr/bin/ld: cannot find -lcudart: No such file or directory
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1814, in _run_ninja_build
    env=env)
  File "/home//Applications/miniconda3/envs/gsrec/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 29, in <module>
    from gaussian_renderer import prefilter_voxel, render, network_gui
  File "/home//projects/gsrec/gaussian_renderer/__init__.py", line 17, in <module>
    from scene.gaussian_model_implicit import GaussianModel
  File "/home//projects/gsrec/scene/__init__.py", line 17, in <module>
    from scene.gaussian_model_implicit import GaussianModel
  File "/home//projects/gsrec/scene/gaussian_model_implicit.py", line 47, in <module>
    from hashencoder.hashgrid import HashEncoder
  File "/home//projects/gsrec/hashencoder/__init__.py", line 1, in <module>
    from .hashgrid import HashEncoder
  File "/home//projects/gsrec/hashencoder/hashgrid.py", line 10, in <module>
    from .backend import _backend
  File "/home//projects/gsrec/hashencoder/backend.py", line 21, in <module>
    verbose=True,
  File "/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1214, in load
    keep_intermediates=keep_intermediates)
  File "/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1435, in _jit_compile
    is_standalone=is_standalone)
  File "/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1540, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_hash_encoder'

Hi, Thank you for your reply. I meet the same problem with the repository gsrec.

Because I don't install cudatoolkit in my machine. I just install conda version cuda-toolkit from channel nvidia not cudatoolkit. It seems "-lcudart" cannot be found.

I am not sure if there any other solution for it without installing cuda in the machine. Thanks!!!

wongsinglam commented 3 days ago

Hi, report again.

When I tried to get cuda-toolkit 11.8 in the real machine. New problem here.

Detected CUDA files, patching ldflags
Emitting ninja build file ./tmp_build/build.ninja...
Building extension module _hash_encoder...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF bindings.o.d -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -c /home/wsl/projects/gsrec/hashencoder/src/bindings.cpp -o bindings.o 
[2/3] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -std=c++14 -allow-unsupported-compiler -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -c /home/wsl/projects/gsrec/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
FAILED: hashencoder.cuda.o 
/usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -std=c++14 -allow-unsupported-compiler -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -c /home/wsl/projects/gsrec/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
/usr/include/c++/12/bits/locale_facets_nonio.tcc: In member function ‘_InIter std::time_get<_CharT, _InIter>::get(iter_type, iter_type, std::ios_base&, std::ios_base::iostate&, tm*, const char_type*, const char_type*) const’:
/usr/include/c++/12/bits/locale_facets_nonio.tcc:1477:77: error: invalid type argument of unary ‘*’ (have ‘int’)
 1477 |       if ((void*)(this->*(&time_get::do_get)) == (void*)(&time_get::do_get))
      |                                                                             ^   
/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/pybind11/cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/pybind11/cast.h:951:120: error: expected template-name before ‘<’ token
  951 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                        ^
/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/pybind11/cast.h:951:120: error: expected identifier before ‘<’ token
/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/pybind11/cast.h:951:123: error: expected primary-expression before ‘>’ token
  951 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                           ^
/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/pybind11/cast.h:951:126: error: expected primary-expression before ‘)’ token
  951 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                              ^
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1814, in _run_ninja_build
    env=env)
  File "/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 29, in <module>
    from gaussian_renderer import prefilter_voxel, render, network_gui
  File "/home/wsl/projects/gsrec/gaussian_renderer/__init__.py", line 17, in <module>
    from scene.gaussian_model_implicit import GaussianModel
  File "/home/wsl/projects/gsrec/scene/__init__.py", line 17, in <module>
    from scene.gaussian_model_implicit import GaussianModel
  File "/home/wsl/projects/gsrec/scene/gaussian_model_implicit.py", line 47, in <module>
    from hashencoder.hashgrid import HashEncoder
  File "/home/wsl/projects/gsrec/hashencoder/__init__.py", line 1, in <module>
    from .hashgrid import HashEncoder
  File "/home/wsl/projects/gsrec/hashencoder/hashgrid.py", line 10, in <module>
    from .backend import _backend
  File "/home/wsl/projects/gsrec/hashencoder/backend.py", line 21, in <module>
    verbose=True,
  File "/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1214, in load
    keep_intermediates=keep_intermediates)
  File "/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1435, in _jit_compile
    is_standalone=is_standalone)
  File "/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1540, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_hash_encoder'
QianyiWu commented 3 days ago

Hi, would you mind providing more information about your OS, GPU and your own CUDA version?

wongsinglam commented 3 days ago

Ubuntu 2204, rtx 3090, NVIDIA-SMI 550.120, cuda-toolkit 11.8 (tried both real machine and conda virtual one from nvidia chanel).

Hope it helps

QianyiWu commented 3 days ago

And what is your pytorch version?

wongsinglam commented 3 days ago

I am using the pytorch version you provided in gsrec. And for objsdf_pp also with pytorch 2.0.0 you provided

wongsinglam commented 3 days ago

I was wondering what cuda-toolkit you are using. it would be easier for me to find out the problem as well.

wongsinglam commented 2 days ago

Hi, I think it may be related to the version of c/c++ I got different errors when I switch the version of c/c++. Could you please share your version of c/c++?

QianyiWu commented 2 days ago

I remember my GCC version was not higher than 11 in these projects.

QianyiWu commented 2 days ago

Hi,

I met cannot find -lcudart issue in other projects. And I solved it by export CUDA_HOME=/usr/local/cuda and recompile it. I noticed you faced this issue before. Hope it could help.

wongsinglam commented 2 days ago

Thank you for your reply. Yes, it is the issue related to cuda. Keep c version under 11 and install cuda toolkit in real machine and it solve my problem.