If you use our work in your research, please cite it as follows:
@article{tang2023MVDiffusion,
title={MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion},
author={Tang, Shitao and Zhang, Fuayng and Chen, Jiacheng and Wang, Peng and Yasutaka, Furukawa},
journal={arXiv preprint 2307.01097},
year={2023}
}
Install the necessary packages by running the following command:
pip install -r requirements.txt
We provide baseline results and models for the following:
Please put those files in 'MVDiffusion/weights'.
Test the demo by running:
python demo.py --text "This kitchen is a charming blend of rustic and modern, featuring a large reclaimed wood island with marble countertop, a sink surrounded by cabinets. To the left of the island, a stainless-steel refrigerator stands tall. To the right of the sink, built-in wooden cabinets painted in a muted."
python demo.py --text_path assets/prompts.txt --image_path assets/outpaint_example.png
├── data
├── mp3d_skybox
├── train.npy
├── test.npy
├── 5q7pvUzZiYa
├──blip3
├──matterport_skybox_images
├── 1LXtFkjw3qL
├── ....
├── data
├── scannet
├── train
├── scene0435_01
├── color
├── depth
├── intrinsic
├── pose
├── prompt
├── key_frame_0.6.txt
├── valid_frames.npy
├── test
Execute the following scripts for testing:
sh test_pano.sh
: Generate 8 multi-view panoramic images in the Matterport3D testing dataset.sh test_pano_outpaint.sh
: Generate 8 multi-view images conditioned on a single view image (outpaint) in the Matterport3D testing dataset.sh test_depth_fix_frames.sh
: Generate 12 depth-conditioned images in the ScanNet testing dataset.sh test_depth_fix_interval.sh
: Generate a sequence of depth-conditioned images (every 20 frames) in the ScanNet testing dataset.sh test_depth_two_stage.sh
: Generate a sequence of depth-conditioned images (key frames), and interpolate the in-between images, in the ScanNet testing dataset.After running either sh test_depth_fix_interval.sh
or sh test_depth_two_stage.sh
, you can use TSDF fusion to get textured mesh.
Execute the following scripts for training:
sh train_pano.sh
: Train the panoramic image generation model.sh train_pano_outpaint.sh
: Train the panoramic image outpaint model.sh train_depth.sh
: Train the depth conditioned generation model.Panorama generation:
Multi-view Depth-to-Image Generation:
This project is licensed under the terms of the MIT license.
For any questions, feel free to contact us at [shitaot@sfu.ca].