ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Xianghui Yang1,2, Yan Zuo1, Sameera Ramasinghe1, Loris Bazzani1, Gil Avraham1, Anton van den Hengel1,3
1Amazon, 2The University of Sydney, 3The University of Adelaide
Training? No! We don't need any training or finetuning. :wink:
We use the totally same environment with Zero-1-to-3.
conda create -n zero123 python=3.9
conda activate zero123
cd zero123
pip install -r requirements.txt
git clone https://github.com/CompVis/taming-transformers.git
pip install -e taming-transformers/
git clone https://github.com/openai/CLIP.git
pip install -e CLIP/
Download checkpoint under zero123
through one of the following sources:
https://huggingface.co/cvlab/zero123-weights/tree/main
wget https://cv.cs.columbia.edu/zero123/assets/$iteration.ckpt # iteration = [105000, 165000, 230000, 300000]
wget https://zero123.cs.columbia.edu/assets/zero123-xl.ckpt
Zero-1-to-3 has released 5 model weights: 105000.ckpt
, 165000.ckpt
, 230000.ckpt
, 300000.ckpt
, and zero123-xl.ckpt
. By default, we use zero123-xl.ckpt
, but we also find that 105000.ckpt which is the checkpoint after finetuning 105000 iterations on objaverse has better generalization ablitty. So if you are trying to generate novel-view images and find one model fails, you can try another one.
We have provided some processed real images in ./3drec/data/real_images/
. You can directly run generate_360_view_autoregressive.py
and play it. Don't forget to change the model path and configure path in the code.
cd ./3drec/data/
python generate_360_view_autoregressive_real.py # generate 360-degree images for real image demo
python generate_360_view_autoregressive.py # generate 360-degree images for multi-view consistency evaluation
python generate_zero123renderings_autoregressive.py # generate only 1 novel-view image given target pose for image quality evaluation
If you want to try it on your own images, you may need to pre-process them, including resize and segmentation.
cd ./3drec/data/
python process_real_images.py
If you find any other interesting images that can be shown here, please send it to me and I'm very happy to make our project more attractive! :wink:
Zero-1-to-3 was trained on objaverse and you can download their renderings with:
wget https://tri-ml-public.s3.amazonaws.com/datasets/views_release.tar.gz
The GSO renderings evaluated in our paper can be downloaded with:
gdown https://drive.google.com/file/d/10JpfE5jqJRiuhQbxn74ks8RsLp1wpMim/view?usp=sharing
Note that we haven't use the distillation way to get the 3D model, no matter SDS or SJC. We directly train NeuS as our model can generate consistent multi-view images. Feel free to explore and play around!
cd 3drec
pip install -r requirements.txt
cd ../syncdreamer_3drec
python my_train_renderer_spin36.py \
-i {img_dir}/{model_name}/{inference_id} \
-n {model_name} \
-e {elevation} \
-d {distance} \
-l {output_dir}
syncdreamer_3drec/{output_dir}/{model_name}
. If you are trying on real images and have no idea about the evaluation
and distance
, maybe you can set them as default 60/180*pi
and 1.5
, respectively.This repository is based on Stable Diffusion, Zero-1-to-3, Objaverse, NeuS and SyncDreamer. We would like to thank the authors of these work for publicly releasing their code.
@misc{yang2024viewfusion,
title={ViewFusion: Towards Multi-View Consistency via Interpolated Denoising},
author={Xianghui Yang and Yan Zuo and Sameera Ramasinghe and Loris Bazzani and Gil Avraham and Anton van den Hengel},
year={2024},
eprint={2402.18842},
archivePrefix={arXiv},
primaryClass={cs.CV}
}