Efficient Geometry-aware 3D Generative Adversarial Networks
Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein
* equal contribution
https://nvlabs.github.io/eg3d/
Abstract: Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. We introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.
For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing
cd eg3d
conda env create -f environment.yml
conda activate eg3d
Pre-trained networks are stored as *.pkl
files that can be referenced using local filenames. See Models for download links to pre-trained checkpoints.
# Generate videos using pre-trained model
python gen_videos.py --outdir=out --trunc=0.7 --seeds=0-3 --grid=2x2 \
--network=networks/network_snapshot.pkl
# Generate the same 4 seeds in an interpolation sequence
python gen_videos.py --outdir=out --trunc=0.7 --seeds=0-3 --grid=1x1 \
--network=networks/network_snapshot.pkl
# Generate images and shapes (as .mrc files) using pre-trained model
python gen_samples.py --outdir=out --trunc=0.7 --shapes=true --seeds=0-3 \
--network=networks/network_snapshot.pkl
We visualize our .mrc shape files with UCSF Chimerax.
To visualize a shape in ChimeraX do the following:
.mrc
file with File > Open
Tools > Volume Data > Volume Viewer
Lighting
menu in the top bar, change lighting to "Full"This release contains an interactive model visualization tool that can be used to explore various characteristics of a trained model. To start it, run:
python visualizer.py
See the Visualizer Guide
for a description of important options.
You can use pre-trained networks in your own Python code as follows:
with open('ffhq.pkl', 'rb') as f:
G = pickle.load(f)['G_ema'].cuda() # torch.nn.Module
z = torch.randn([1, G.z_dim]).cuda() # latent codes
c = torch.cat([cam2world_pose.reshape(-1, 16), intrinsics.reshape(-1, 9)], 1) # camera parameters
img = G(z, c)['image'] # NCHW, float32, dynamic range [-1, +1], no truncation
The above code requires torch_utils
and dnnlib
to be accessible via PYTHONPATH
. It does not need source code for the networks themselves — their class definitions are loaded from the pickle via torch_utils.persistence
.
The pickle contains three networks. 'G'
and 'D'
are instantaneous snapshots taken during training, and 'G_ema'
represents a moving average of the generator weights over several training steps. The networks are regular instances of torch.nn.Module
, with all of their parameters and buffers placed on the CPU at import and gradient computation disabled by default.
The generator consists of two submodules, G.mapping
and G.synthesis
, that can be executed separately. They also support various additional options:
w = G.mapping(z, conditioning_params, truncation_psi=0.5, truncation_cutoff=8)
img = G.synthesis(w, camera_params)['image]
Please refer to gen_samples.py
for complete code example.
Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json
for labels. Each label is a 25-length list of floating point numbers, which is the concatenation of the flattened 4x4 camera extrinsic matrix and flattened 3x3 camera intrinsic matrix. Custom datasets can be created from a folder containing images; see python dataset_tool.py --help
for more information. Alternatively, the folder can also be used directly as a dataset, without running it through dataset_tool.py
first, but doing so may lead to suboptimal performance.
FFHQ: Download and process the Flickr-Faces-HQ dataset using the following commands.
Ensure the Deep3DFaceRecon_pytorch submodule is properly initialized
git submodule update --init --recursive
Run the following commands
cd dataset_preprocessing/ffhq
python runme.py
Optional: preprocessing in-the-wild portrait images. In case you want to crop in-the-wild face images and extract poses using Deep3DFaceRecon_pytorch in a way that align with the FFHQ data above and the checkpoint, run the following commands
cd dataset_preprocessing/ffhq
python preprocess_in_the_wild.py --indir=INPUT_IMAGE_FOLDER
AFHQv2: Download and process the AFHQv2 dataset with the following.
cd dataset_preprocessing/afhq
python runme.py "path/to/downloaded/afhq.zip"
ShapeNet Cars: Download and process renderings of the cars category of ShapeNet using the following commands. NOTE: the following commands download renderings of the ShapeNet cars from the Scene Representation Networks repository.
cd dataset_preprocessing/shapenet
python runme.py
You can train new networks using train.py
. For example:
# Train with FFHQ from scratch with raw neural rendering resolution=64, using 8 GPUs.
python train.py --outdir=~/training-runs --cfg=ffhq --data=~/datasets/FFHQ_512.zip \
--gpus=8 --batch=32 --gamma=1 --gen_pose_cond=True
# Second stage finetuning of FFHQ to 128 neural rendering resolution (optional).
python train.py --outdir=~/training-runs --cfg=ffhq --data=~/datasets/FFHQ_512.zip \
--resume=~/training-runs/ffhq_experiment_dir/network-snapshot-025000.pkl \
--gpus=8 --batch=32 --gamma=1 --gen_pose_cond=True --neural_rendering_resolution_final=128
# Train with Shapenet from scratch, using 8 GPUs.
python train.py --outdir=~/training-runs --cfg=shapenet --data=~/datasets/cars_train.zip \
--gpus=8 --batch=32 --gamma=0.3
# Train with AFHQ, finetuning from FFHQ with ADA, using 8 GPUs.
python train.py --outdir=~/training-runs --cfg=afhq --data=~/datasets/afhq.zip \
--gpus=8 --batch=32 --gamma=5 --aug=ada --neural_rendering_resolution_final=128 --gen_pose_cond=True --gpc_reg_prob=0.8
Please see the Training Guide for a guide to setting up a training run on your own data.
Please see Models for recommended training configurations and download links for pre-trained checkpoints.
The results of each training run are saved to a newly created directory, for example ~/training-runs/00000-ffhq-ffhq512-gpus8-batch32-gamma1
. The training loop exports network pickles (network-snapshot-<KIMG>.pkl
) and random image grids (fakes<KIMG>.png
) at regular intervals (controlled by --snap
). For each exported pickle, it evaluates FID (controlled by --metrics
) and logs the result in metric-fid50k_full.jsonl
. It also records various statistics in training_stats.jsonl
, as well as *.tfevents
if TensorBoard is installed.
By default, train.py
automatically computes FID for each network pickle exported during training. We recommend inspecting metric-fid50k_full.jsonl
(or TensorBoard) at regular intervals to monitor the training progress. When desired, the automatic computation can be disabled with --metrics=none
to speed up the training slightly.
Additional quality metrics can also be computed after the training:
# Previous training run: look up options automatically, save result to JSONL file.
python calc_metrics.py --metrics=fid50k_full \
--network=~/training-runs/network-snapshot-000000.pkl
# Pre-trained network pickle: specify dataset explicitly, print result to stdout.
python calc_metrics.py --metrics=fid50k_full --data=~/datasets/ffhq_512.zip \
--network=ffhq-128.pkl
Note that the metrics can be quite expensive to compute (up to 1h), and many of them have an additional one-off cost for each new dataset (up to 30min). Also note that the evaluation is done using a different random seed each time, so the results will vary if the same metric is computed multiple times.
References:
@inproceedings{Chan2022,
author = {Eric R. Chan and Connor Z. Lin and Matthew A. Chan and Koki Nagano and Boxiao Pan and Shalini De Mello and Orazio Gallo and Leonidas Guibas and Jonathan Tremblay and Sameh Khamis and Tero Karras and Gordon Wetzstein},
title = {Efficient Geometry-aware {3D} Generative Adversarial Networks},
booktitle = {CVPR},
year = {2022}
}
This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.
We thank David Luebke, Jan Kautz, Jaewoo Seo, Jonathan Granskog, Simon Yuen, Alex Evans, Stan Birchfield, Alexander Bergman, and Joy Hsu for feedback on drafts, Alex Chan, Giap Nguyen, and Trevor Chan for help with diagrams, and Colette Kress and Bryan Catanzaro for allowing use of their photographs. This project was in part supported by Stanford HAI and a Samsung GRO. Koki Nagano and Eric Chan were partially supported by DARPA’s Semantic Forensics (SemaFor) contract (HR0011-20-3-0005). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. Distribution Statement "A" (Approved for Public Release, Distribution Unlimited).