Jing He1✱,
Haodong Li1✱,
Wei Yin2,
Yixun Liang1,
[Kaiqiang Zhou]()3,
[Hongbo Zhang]()3,
Bingbing Liu3,
Ying-Cong Chen1,4✉
We present Lotus, a diffusion-based visual foundation model for dense geometry prediction. With minimal training data, Lotus achieves SoTA performance in two key geometry perception tasks, i.e., zero-shot depth and normal estimation. "Avg. Rank" indicates the average ranking across all metrics, where lower values are better. Bar length represents the amount of training data used.
This installation was tested on: Ubuntu 20.04 LTS, Python 3.10, CUDA 12.3, NVIDIA A800-SXM4-80GB.
Clone the repository (requires git):
git clone https://github.com/EnVision-Research/Lotus.git
cd Lotus
Install dependencies (requires conda):
conda create -n lotus python=3.10 -y
conda activate lotus
pip install -r requirements.txt
python app.py depth
python app.py normal
assets/in-the-wild_example
(where we have prepared several examples). bash infer.sh
. Prepare benchmark datasets:
cd datasets/eval/depth/
wget -r -np -nH --cut-dirs=4 -R "index.html*" -P . https://share.phys.ethz.ch/~pf/bingkedata/marigold/evaluation_dataset/
- For **normal** estimation, you can download the [evaluation datasets (normal)](https://drive.google.com/drive/folders/1t3LMJIIrSnCGwOEf53Cyg0lkSXd3M4Hm?usp=drive_link) (`dsine_eval.zip`) into the path `datasets/eval/normal/` and unzip it (referred to [DSINE](https://github.com/baegwangbin/DSINE?tab=readme-ov-file#getting-started)).
Run the evaluation command: bash eval.sh
Below are the released models and their corresponding configurations: | CHECKPOINT_DIR | TASK_NAME | MODE |
---|---|---|---|
jingheya/lotus-depth-g-v1-0 |
depth | generation |
|
jingheya/lotus-depth-d-v1-0 |
depth | regression |
|
jingheya/lotus-depth-g-v2-0-disparity |
depth (disparity) | generation |
|
jingheya/lotus-depth-d-v2-0-disparity |
depth (disparity) | regression |
|
jingheya/lotus-normal-g-v1-0 |
normal | generation |
|
jingheya/lotus-normal-d-v1-0 |
normal | regression |
If you find our work useful in your research, please consider citing our paper:
@article{he2024lotus,
title={Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction},
author={He, Jing and Li, Haodong and Yin, Wei and Liang, Yixun and Li, Leheng and Zhou, Kaiqiang and Liu, Hongbo and Liu, Bingbing and Chen, Ying-Cong},
journal={arXiv preprint arXiv:2409.18124},
year={2024}
}