RuntimeError: Optimizer config for 'xyz' not found in config file.

chensh1127 commented 6 months ago

I have resolved this issue with your guidance, #(https://github.com/WU-CVGL/BAD-Gaussians/issues/5) but encountered a new problem when running the code. I run the code ns-train bad-gaussians --data /path/my/data/ --pipeline.model.camera-optimizer.mode "cubic" --vis viewer+tensorboard nerfstudio-data --eval_mode "all" --downscale_factor 4

and the bug is: RuntimeError: Optimizer config for 'xyz' not found in config file. Make sure you specify an optimizer for each parameter group. Provided configs were: dict_keys(['means', 'features_dc', 'features_rest', 'opacities', 'scales', 'quats', 'camera_opt'])

Did my training fail?

then I run the code : [ns-viewer --load-config /path/outputs/data/config.yml. the bug is ─────────────────────────────────────────────────────── Error No checkpoint directory found at outputs/blurtanabata/data/nerfstudio_models,
Please make sure the checkpoint exists, they should be generated periodically during training

"Did I miss a step? Why are there no checkpoints?"

LingzheZhao commented 6 months ago

Hi, the training is failed indeed. In a previous issue (#2), I upgraded the parameter names in BAD-Gaussians following the nerfstudio upstream (https://github.com/nerfstudio-project/nerfstudio/pull/2946/files), so you need to install the latest nerfstudio: (FYI, recently I updated the docs with nerfstudio installation here)

# install nerfstudio!
git clone https://github.com/nerfstudio-project/nerfstudio
cd nerfstudio
pip install --upgrade pip setuptools
pip install -e .

and this may overwrite the gsplat (in issue #2), thus we also reinstall it:

# make sure old version is uninstalled
pip uninstall gsplat
# install the forked version with camera pose gradient enabled
pip install git+https://github.com/LingzheZhao/gsplat

chensh1127 commented 6 months ago

sorry. My nerfstudio environment has been stuck in the failed compilation of tinycudann. So I haven't had a chance to go through the entire process again.

LingzheZhao commented 6 months ago

sorry. My nerfstudio environment has been stuck in the failed compilation of tinycudann. So I haven't had a chance to go through the entire process again.

Oops. Actually you can skip installing tinycudann since it is not necessary for 3D-GS. However there must be something wrong with the environment. I suggest creating a new conda environment and everything should be okay.

# create a fresh conda env
conda create --name nerfstudio_new -y python=3.10
conda activate nerfstudio_new

# install dependencies
pip install --upgrade pip setuptools
pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
# (optional) pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

# install nerfstudio!
git clone https://github.com/nerfstudio-project/nerfstudio
cd nerfstudio
pip install -e .

chensh1127 commented 5 months ago

Sorry, I was busy with other projects recently. I followed your tutorial and succeeded, although I encountered a problem with mismatched gcc versions in the middle, which I have resolved. However, now I am encountering an error when executing pip install git+https://github.com/LingzheZhao/gsplat the error is :

is this a mismatched CUDA version issue ?

LingzheZhao commented 5 months ago

Thank you for your continuous interest in our work!

Assume that you have installed the cuda toolkit 11.8 in a fresh conda environment named badgaussian with conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit as described above. Can you try the following:

export CUDA_HOME="/home/<YourUserName>/miniconda3/envs/badgaussian/"
pip install git+https://github.com/LingzheZhao/gsplat

If this works, then you can make this permanent in this conda environment with:

conda env config vars set CUDA_HOME="/home/<YourUserName>/miniconda3/envs/badgaussian/"

chensh1127 commented 5 months ago

sorry. I run the pip install git+https://github.com/LingzheZhao/gsplat the error is : error-4

The installation process of gsplat always fails.

LingzheZhao commented 5 months ago

Hi, it seems that the setuptools failed to detect the architecture of your GPU, thus the arch_list is empty. You can try setting this environment variable before pip install:

export TORCH_CUDA_ARCH_LIST="7.0;7.2;8.0;8.6;8.9"

If this does not work, note that the official gsplat also supports pose optimization now, and you can also try that (pip install -U gsplat). After installation, it will build the CUDA code on the first run (JIT).

chensh1127 commented 5 months ago

Thanks a lot. I have resolved the issue with the environment configuration, which was caused by the installation of tiny-cuda-nn

WU-CVGL / BAD-Gaussians

RuntimeError: Optimizer config for 'xyz' not found in config file. #6