cvg / NoPoSplat

No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
MIT License
368 stars 6 forks source link

No Pose, No Problem
Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

Botao Ye · Sifei Liu · Haofei Xu · Xueting Li · Marc Pollefeys · Ming-Hsuan Yang · Songyou Peng

Paper | Project Page | Online Demo (Coming Soon)

Teaser

NoPoSplat predicts 3D Gaussians in a canonical space from unposed sparse images,
enabling high-quality novel view synthesis and accurate pose estimation.



Table of Contents
  1. Installation
  2. Pre-trained Checkpoints
  3. Camera Conventions
  4. Datasets
  5. Running the Code
  6. Acknowledgements
  7. Citation

Installation

Our code relies on Python 3.10+, and is developed based on PyTorch 2.1.2 and CUDA 11.8, but it should work with higher Pytorch/CUDA versions as well.

  1. Clone NoPoSplat.

    git clone https://github.com/cvg/NoPoSplat
    cd NoPoSplat
  2. Create the environment, here we show an example using conda.

    conda create -y -n noposplat python=3.10
    conda activate noposplat
    pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
    pip install -r requirements.txt
  3. Optional, compile the cuda kernels for RoPE (as in CroCo v2).

    # NoPoSplat relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.
    cd src/model/encoder/backbone/croco/curope/
    python setup.py build_ext --inplace
    cd ../../../../../..

Pre-trained Checkpoints

Our models are hosted on Hugging Face 🤗

Model name Training resolutions Training data
re10k.ckpt 256x256 re10k
acid.ckpt 256x256 acid
mixRe10kDl3dv.ckpt 256x256 re10k, dl3dv
mixRe10kDl3dv_512x512.ckpt 512x512 re10k, dl3dv

We assume the downloaded weights are located in the pretrained_weights directory.

Camera Conventions

Our camera system is the same as pixelSplat. The camera intrinsic matrices are normalized (the first row is divided by image width, and the second row is divided by image height). The camera extrinsic matrices are OpenCV-style camera-to-world matrices ( +X right, +Y down, +Z camera looks into the screen).

Datasets

Please refer to DATASETS.md for dataset preparation.

Running the Code

Training

First download the Mast3r pretrained model and put it in the ./pretrained_weights directory.

Then call src/main.py via:

# 8 GPUs, with each batch size = 16. Remove the last two arguments if you don't want to use wandb for logging
python -m src.main +experiment=re10k wandb.mode=online wandb.name=re10k

This default training configuration requires 8x GPUs with a batch size of 16 on each GPU (>=80GB memory). The training will take approximately 6 hours to complete. You can adjust the batch size to fit your hardware, but note that changing the total batch size may require modifying the initial learning rate to maintain performance. You can refer to the re10k_1x8 for training on 1 A6000 GPU (48GB memory), which will produce similar performance.

Evaluation

Novel View Synthesis

# RealEstate10K
python -m src.main +experiment=re10k mode=test wandb.name=re10k dataset/view_sampler@dataset.re10k.view_sampler=evaluation dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json checkpointing.load=./pretrained_weights/re10k.ckpt test.save_image=true
# RealEstate10K
python -m src.main +experiment=acid mode=test wandb.name=acid dataset/view_sampler@dataset.re10k.view_sampler=evaluation dataset.re10k.view_sampler.index_path=assets/evaluation_index_acid.json checkpointing.load=./pretrained_weights/acid.ckpt test.save_image=true

You can set wandb.name=SAVE_FOLDER_NAME to specify the saving path.

Pose Estimation

To evaluate the pose estimation performance, you can run the following command:

# RealEstate10K
python -m src.eval_pose +experiment=re10k +evaluation=eval_pose checkpointing.load=./pretrained_weights/mixRe10kDl3dv.ckpt dataset/view_sampler@dataset.re10k.view_sampler=evaluation dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json
# ACID
python -m src.eval_pose +experiment=acid +evaluation=eval_pose checkpointing.load=./pretrained_weights/mixRe10kDl3dv.ckpt dataset/view_sampler@dataset.re10k.view_sampler=evaluation dataset.re10k.view_sampler.index_path=assets/evaluation_index_acid.json
# ScanNet-1500
python -m src.eval_pose +experiment=scannet_pose +evaluation=eval_pose checkpointing.load=./pretrained_weights/mixRe10kDl3dv.ckpt

Note that here we show the evaluation using the mixed model trained on RealEstate10K and DL3DV. You can replace the checkpoint path with other trained models.

Acknowledgements

This project is developed with several fantastic repos: pixelSplat, DUSt3R, and CroCo. We thank the original authors for their excellent work. We thank the kindly help of David Charatan for providing the evaluation code and the pretrained models for some of the previous methods.

Citation

@article{ye2024noposplat,
      title   = {No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images},
      author  = {Ye, Botao and Liu, Sifei and Xu, Haofei and Xueting, Li and Pollefeys, Marc and Yang, Ming-Hsuan and Songyou, Peng},
      journal = {arXiv preprint arXiv:2410.24207},
      year    = {2024}
    }