WHU-USI3DV / VistaDream

[arXiv'24] VistaDream: Sampling multiview consistent images for single-view scene reconstruction
https://vistadream-project-page.github.io/
MIT License
228 stars 8 forks source link
3d-generation 3d-reconstruction diffusion-models novel-view-synthesis scene-generation single-view-reconstruction

VistaDream: Sampling multiview consistent images for single-view scene reconstruction

This is the official PyTorch implementation of the following publication:

VistaDream: Sampling multiview consistent images for single-view scene reconstruction
Haiping Wang, Yuan Liu, Ziwei Liu, Wenping Wang, Zhen Dong, Bisheng Yang
arXiv 2024
Paper | Project-page(with Interactive DEMOs)

πŸ”­ Introduction

TL;DR: VistaDream is a training-free framework to reconstruct a high-quality 3D scene from a single-view image.

Image RGB GIF Depth GIF
Input Image RGBs of the reconstructed scene Depths of the reconstructed scene
More results and interactive demos are provided in the Project Page.

Abstract: In this paper, we propose VistaDream a novel framework to reconstruct a 3D scene from a single-view image.Recent diffusion models enable generating high-quality novel-view images from a single-view input image. Most existing methods only concentrate on building the consistency between the input image and the generated images while losing the consistency between the generated images. VistaDream addresses this problem by a two-stage pipeline. In the first stage, VistaDream begins with building a global coarse 3D scaffold by zooming out a little step with outpainted boundaries and an estimated depth map. Then, on this global scaffold, we use iterative diffusion-based RGB-D inpainting to generate novel-view images to inpaint the holes of the scaffold. In the second stage, we further enhance the consistency between the generated novel-view images by a novel training-free Multi-view Consistency Sampling (MCS) that introduces multi-view consistency constraints in the reverse sampling process of diffusion models. Experimental results demonstrate that without training or fine-tuning existing diffusion models, VistaDream achieves consistent and high-quality novel view synthesis using just single-view images and outperforms baseline methods by a large margin.

πŸ†• News

πŸ’» Requirements

The code has been tested on:

πŸ”§ Installation

For complete installation instructions, please see INSTALL.md.

πŸš… Pretrained model

VistaDream is training-free but utilizes pretrained models of several existing projects. To download pretrained models for Fooocus, Depth-Pro, OneFormer, SD-LCM, run the following command:

bash download_weights.sh

The pretrained models of LLaVA and Stable Diffusion-1.5 will be automatically downloaded from hugging face on the first running.

πŸ”¦ Demo

Try VistaDream using the following commands:

python demo.py

Then, you should obtain:

If you need to improve the reconstruction quality of your own images, please refer to INSTRUCT.md

To visualize the generated gaussian field, you can use the following script:

import torch
from ops.utils import save_ply
scene = torch.load(f'data/vistadream/piano/refine.scene.pth')
save_ply(scene,'gf.ply')

and feed the gf.ply to SuperSplat for visualization.

πŸ”¦ ToDo List

πŸ’‘ Citation

If you find this repo helpful, please give us a 😍 star 😍. Please consider citing VistaDream if this program benefits your project

@article{wang2024vistadream,
  title={VistaDream: Sampling multiview consistent images for single-view scene reconstruction},
  author={Haiping Wang and Yuan Liu and Ziwei Liu and Zhen Dong and Wenping Wang and Bisheng Yang},
  journal={arXiv preprint arXiv:2410.16892},
  year={2024}
}

πŸ”— Related Projects

We sincerely thank the excellent open-source projects: