ShenhanQian / GaussianAvatars

[CVPR 2024 Highlight] The official repo for "GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians"
https://shenhanqian.github.io/gaussian-avatars
Other
613 stars 93 forks source link

how to install on headless AWS server? #56

Closed jryebread closed 2 months ago

jryebread commented 6 months ago

Hi, I am on an AWS server, I have succesfully installed pip install -r requirements.txt but when I run train command on dataset I get error from nvdiffrast, I think due to it being a headless server and not having OpenGL installed.

according to this issue: https://github.com/3DTopia/LGM/issues/38 I can run nvdiffrast with --force_cuda_rast , but I'm not sure where to add this in the code, can you help me?

Thank you in advance. | NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2

FULL ERROR BELOW


Optimizing output/UNION10EMOEXP_306_eval_600k
Output folder: output/UNION10EMOEXP_306_eval_600k [22/05 22:21:52]
/opt/conda/envs/a/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
Traceback (most recent call last):
  File "/opt/conda/envs/a/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build
    subprocess.run(
  File "/opt/conda/envs/a/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ec2-user/GaussianAvatars/train.py", line 350, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "/home/ec2-user/GaussianAvatars/train.py", line 40, in training
    mesh_renderer = NVDiffRenderer()
  File "/home/ec2-user/GaussianAvatars/mesh_renderer/__init__.py", line 29, in __init__
    self.glctx = dr.RasterizeGLContext() if use_opengl else dr.RasterizeCudaContext()
  File "/opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/torch/ops.py", line 221, in __init__
    self.cpp_wrapper = _get_plugin(gl=True).RasterizeGLStateWrapper(output_db, mode == 'automatic', cuda_device_idx)
  File "/opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/torch/ops.py", line 118, in _get_plugin
    torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts+['-lineinfo'], extra_ldflags=ldflags, with_cuda=True, verbose=False)
  File "/opt/conda/envs/a/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1309, in load
    return _jit_compile(
  File "/opt/conda/envs/a/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1719, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/opt/conda/envs/a/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1832, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/opt/conda/envs/a/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'nvdiffrast_plugin_gl': [1/4] /opt/conda/envs/a/bin/x86_64-conda-linux-gnu-c++ -MMD -MF torch_rasterize_gl.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin_gl -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-12.1/include -isystem /opt/conda/envs/a/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -DNVDR_TORCH -c /opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/torch/torch_rasterize_gl.cpp -o torch_rasterize_gl.o
FAILED: torch_rasterize_gl.o
/opt/conda/envs/a/bin/x86_64-conda-linux-gnu-c++ -MMD -MF torch_rasterize_gl.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin_gl -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-12.1/include -isystem /opt/conda/envs/a/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -DNVDR_TORCH -c /opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/torch/torch_rasterize_gl.cpp -o torch_rasterize_gl.o
In file included from /opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/torch/../common/rasterize_gl.h:16,
                 from /opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/torch/torch_rasterize_gl.cpp:12:
/opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/torch/../common/glutil.h:36:10: fatal error: EGL/egl.h: No such file or directory
   36 | #include <EGL/egl.h>
      |          ^~~~~~~~~~~
compilation terminated.

[2/4] /opt/conda/envs/a/bin/x86_64-conda-linux-gnu-c++ -MMD -MF glutil.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin_gl -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-12.1/include -isystem /opt/conda/envs/a/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -DNVDR_TORCH -c /opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/common/glutil.cpp -o glutil.o
FAILED: glutil.o
/opt/conda/envs/a/bin/x86_64-conda-linux-gnu-c++ -MMD -MF glutil.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin_gl -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-12.1/include -isystem /opt/conda/envs/a/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -DNVDR_TORCH -c /opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/common/glutil.cpp -o glutil.o
In file included from /opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/common/glutil.cpp:14:
/opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/common/glutil.h:36:10: fatal error: EGL/egl.h: No such file or directory
   36 | #include <EGL/egl.h>
      |          ^~~~~~~~~~~
compilation terminated.
[3/4] /opt/conda/envs/a/bin/x86_64-conda-linux-gnu-c++ -MMD -MF rasterize_gl.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin_gl -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-12.1/include -isystem /opt/conda/envs/a/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -DNVDR_TORCH -c /opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/common/rasterize_gl.cpp -o rasterize_gl.o
FAILED: rasterize_gl.o
/opt/conda/envs/a/bin/x86_64-conda-linux-gnu-c++ -MMD -MF rasterize_gl.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin_gl -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/envs/a/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-12.1/include -isystem /opt/conda/envs/a/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -DNVDR_TORCH -c /opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/common/rasterize_gl.cpp -o rasterize_gl.o
In file included from /opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/common/rasterize_gl.h:16,
                 from /opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/common/rasterize_gl.cpp:9:
/opt/conda/envs/a/lib/python3.10/site-packages/nvdiffrast/common/glutil.h:36:10: fatal error: EGL/egl.h: No such file or directory
   36 | #include <EGL/egl.h>
      |          ^~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.```
ShenhanQian commented 6 months ago

Given EGL/egl.h: No such file or directory, you may try sudo apt-get install libegl1-mesa-dev.

jryebread commented 6 months ago

i am on centos (amazon linux) and looks like the equivalent is here: https://centos.pkgs.org/7/centos-x86_64/mesa-libGL-devel-18.3.4-10.el7.x86_64.rpm.html but after installing that it didn't work to fix the EGL error.. I guess I will need to try ubuntu instance instead..

When you ran tests for training the model was it on a headless server or normal linux pc? I'm worried I won't be able to get it to run at all on headless.

ShenhanQian commented 6 months ago

I often run this repo on a headless remote server with Ubuntu. It is also possible to run GUI with x11 forwarding.

Usually, I would first make glxgears run on a machine to make sure OpenGL components are ready, then start setting up the repo.

yosun commented 5 months ago

hmm what happened to this repo? why has it been deleted?