✨ Check out our new work MagicDrive3D on 3D scene generation!
✨ If you want video generation, please find the code at the video branch
.
Videos generated by MagicDrive (click the image to see the video).
This repository contains the implementation of the paper
MagicDrive: Street View Generation with Diverse 3D Geometry Control
Ruiyuan Gao1*, Kai Chen2*, Enze Xie3^, Lanqing Hong3, Zhenguo Li3, Dit-Yan Yeung2, Qiang Xu1^
1CUHK 2HKUST 3Huawei Noah's Ark Lab
*Equal Contribution ^Corresponding Authors
In MagicDrive, we employ two strategies (cross-attention and additive encoder branch) to inject text prompts, camera poses, object boxes, and road maps as conditions for generation. We also propose a cross-view attention module for multiview consistency.
Clone this repo with submodules
git clone --recursive https://github.com/cure-lab/MagicDrive.git
The code is tested with Pytorch==1.10.2
and cuda 10.2
on V100 servers. To setup the python environment, follow:
# option1: to run GUI only
pip install -r requirements/gui.txt
# 😍 our GUI does not need mm-series packages.
# continue to install diffusers from `third_party`.
# option2: to run the full testing demo (and also test your env before training)
cd ${ROOT}
pip install -r requirements/dev.txt
# continue to install `third_party`s as following.
We opt to install the source code for the following packages, with cd ${FOLDER}; pip -vvv install .
# install third-party
third_party/
├── bevfusion -> based on db75150
├── diffusers -> based on v0.17.1 (afcca39)
└── xformers -> based on v0.0.19 (8bf59c9), optional
see note about our xformers. If you have issues with the environment setup, please check FAQ first.
Setup default configuration for accelerate
with
accelerate config
Our default log directory is ${ROOT}/magicdrive-log
. Please be prepared.
Our training is based on stable-diffusion-v1-5. We assume you put them at ${ROOT}/pretrained/
as follows:
{ROOT}/pretrained/stable-diffusion-v1-5/
├── text_encoder
├── tokenizer
├── unet
├── vae
└── ...
Download our pretrained weight for MagicDrive from onedrive and put it in ${ROOT}/pretrained/
Run our demo
👍 We recommend users run our interactive GUI first, because we have minimized the dependencies for the GUI demo.
cd ${ROOT}
python demo/interactive_gui.py
# a gradio-based gui, use your web browser
As suggested by #37, prompt is configurable through GUI!
Run our demo for camera view generation.
cd ${ROOT}
python demo/run.py resume_from_checkpoint=magicdrive-log/SDv1.5mv-rawbox_2023-09-07_18-39_224x400
The generated images will be located in magicdrive-log/test
. More information can be find in demo doc.
We prepare the nuScenes dataset similar to bevfusion's instructions. Specifically,
./data/
. You should have these files:
data/nuscenes
├── maps
├── mini
├── samples
├── sweeps
├── v1.0-mini
└── v1.0-trainval
[!TIP] You can download the
.pkl
files from OneDrive. They should be enough for training and testing.
Generate mmdet3d annotation files by:
python tools/create_data.py nuscenes --root-path ./data/nuscenes \
--out-dir ./data/nuscenes_mmdet3d_2 --extra-tag nuscenes
You should have these files:
data/nuscenes_mmdet3d_2
├── nuscenes_dbinfos_train.pkl (-> ${bevfusion-version}/nuscenes_dbinfos_train.pkl)
├── nuscenes_gt_database (-> ${bevfusion-version}/nuscenes_gt_database)
├── nuscenes_infos_train.pkl
└── nuscenes_infos_val.pkl
Note: As shown above, some files can be soft-linked with the original version from bevfusion. If some of the files is located in data/nuscenes
, you can move them to data/nuscenes_mmdet3d_2
manually.
(Optional) To accelerate data loading, we prepared cache files in h5 format for BEV maps. They can be generated through tools/prepare_map_aux.py
with different configs in configs/dataset
. For example:
python tools/prepare_map_aux.py +process=train
python tools/prepare_map_aux.py +process=val
You will have files like ./val_tmp.h5
and ./train_tmp.h5
. You have to rename the cache files correctly after generating them. Our default is
data/nuscenes_map_aux
├── train_26x200x200_map_aux_full.h5 (42G)
└── val_26x200x200_map_aux_full.h5 (9G)
Launch training with (with 8xV100):
accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes 8 tools/train.py \
+exp=224x400 runner=8gpus
During training, you can check tensorboard for the log and intermediate results.
Besides, we provide debug config to test your environment and data loading process (with 2xV100):
accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes 2 tools/train.py \
+exp=224x400 runner=debug runner.validation_before_run=true
After training, you can test your model for driving view generation through:
python tools/test.py resume_from_checkpoint=${YOUR MODEL}
# take our pretrained model as an example
python tools/test.py resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400
Please find the results in ./magicdrive-log/test/
.
To test FID
First, you should generate the full validation set with
python perception/data_prepare/val_set_gen.py \
resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \
task_id=224x400 fid.img_gen_dir=./tmp/224x400 +fid=data_gen +exp=224x400
# for map=zero as the null condition for CFG, add `runner.pipeline_param.use_zero_map_as_unconditional=true`
For this script, multi-process / multi-node is also available by accelerate
. Just launch it with commands similar to that of training.
Then, test the FID score with
# we assume your torch cache dir is at "../pretrained/torch_cache/". If you want
# to use the default place, please comment the second last line in "tools/fid_score.py".
python tools/fid_score.py cfg \
resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \
fid.rootb=tmp/224x400
Alternatively, we provide the pre-generated samples for validation set here.
You can put them in ./tmp
and launch the test through
python tools/fid_score.py cfg \
resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \
fid.rootb=tmp/224x400/samples # FID=14.46065995481922
# or `fid.rootb=tmp/224x400map0/samples`, FID=16.195992872931697
More results can be found in the main paper.
More results can be found in the main paper.
@inproceedings{gao2023magicdrive,
title={{MagicDrive}: Street View Generation with Diverse 3D Geometry Control},
author={Gao, Ruiyuan and Chen, Kai and Xie, Enze and Hong, Lanqing and Li, Zhenguo and Yeung, Dit-Yan and Xu, Qiang},
booktitle = {International Conference on Learning Representations},
year={2024}
}
We adopt the following open-sourced projects: