KovenYu / WonderJourney

MIT License
656 stars 38 forks source link

WonderJourney: Going from Anywhere to Everywhere

[![a](https://img.shields.io/badge/Website-WonderJourney-blue)](https://kovenyu.com/wonderjourney/) [![arXiv](https://img.shields.io/badge/arXiv-2312.03884-red)](https://arxiv.org/abs/2312.03884) [![twitter](https://img.shields.io/twitter/url?label=Koven_Yu&url=https%3A%2F%2Ftwitter.com%2FKoven_Yu)](https://twitter.com/Koven_Yu)

https://github.com/KovenYu/WonderJourney/assets/27218043/43c864b5-2416-4177-ae39-347150968bc3

https://github.com/KovenYu/WonderJourney/assets/27218043/70eb220d-2521-4033-b736-cf88755a3bcb

WonderJourney: Going from Anywhere to Everywhere

Hong-Xing "Koven" Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann

Getting Started

Installation

For the installation to be done correctly, please proceed only with CUDA-compatible GPU available. It requires 24GB GPU memory to run.

Clone the repo and create the environment:

git clone https://github.com/KovenYu/WonderJourney.git
cd WonderJourney
mamba create --name wonderjourney python=3.10
mamba activate wonderjourney

We are using Pytorch3D to perform rendering. Run the following commands to install it or follow their installation guide (it may take some time).

mamba install pytorch=1.13.0 torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
mamba install -c fvcore -c iopath -c conda-forge fvcore iopath
mamba install -c bottler nvidiacub
mamba install pytorch3d -c pytorch3d

Install the rest of the requirements:

pip install -r requirements.txt

Load English language model for spacy:

python -m spacy download en_core_web_sm

Export your OpenAI api_key (since we use GPT-4 to generate scene descriptions):

export OPENAI_API_KEY='your_api_key_here'

Download Midas DPT model and put it to the root directory.

wget https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_beit_large_512.pt

Run examples

How to add more examples?

We highly encourage you to add new images and try new stuff! You would need to do the image-caption pairing separately (e.g., using DALL-E to generate image and GPT4V to generate description).

Citation

@article{yu2023wonderjourney,
  title={WonderJourney: Going from Anywhere to Everywhere},
  author={Yu, Hong-Xing and Duan, Haoyi and Hur, Junhwa and Sargent, Kyle and Rubinstein, Michael and Freeman, William T and Cole, Forrester and Sun, Deqing and Snavely, Noah and Wu, Jiajun and Herrmann, Charles},
  journal={arXiv preprint arXiv:2312.03884},
  year={2023}
}

Acknowledgement

We appreciate the authors of SceneScape, MiDaS, SAM, Stable Diffusion, and OneFormer to share their code.