DiffusionRig: Learning Personalized Priors for Facial Appearance Editing
[Zheng Ding](), Xuaner Zhang,
Zhihao Xia, Lars Jebe,
Zhuowen Tu, Xiuming Zhang
CVPR 2023
arXiv / Project Page / Video / BibTex
conda create -n diffusionrig python=3.8
conda activate diffusionrig
conda install pytorch=1.11 cudatoolkit=11.3 torchvision -c pytorch
conda install mpi4py dlib scikit-learn scikit-image tqdm -c conda-forge
pip install lmdb opencv-python kornia yacs blobfile chumpy face_alignment
You need to also install pytorch3d to render the physical buffers:
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1110/download.html
Before doing data preparation for training, please first download the source files and checkpoints of DECA to set it up (you will need to create an account to download FLAME resources):
deca_model.tar
: Visit this page to download the pretrained DECA model.generic_model.pkl
: Visit this page to download FLAME 2020
and extract generic_model.pkl
.FLAME_texture.npz
: Visit this same page to download the FLAME texture space
and extract FLAME_texture.npz
.data/
folder:data/
deca_model.tar
generic_model.pkl
FLAME_texture.npz
fixed_displacement_256.npy
head_template.obj
landmark_embedding.npy
mean_texture.jpg
texture_data_256.npy
uv_face_eye_mask.png
uv_face_mask.png
We use FFHQ to train the first stage and a personal photo album to train the second stage. Before training, you need to extract, with DECA, the physical buffers for those images.
For FFHQ, you need to align the images first with:
python scripts/create_data.py --data_dir PATH_TO_FFHQ_ALIGNED_IMAGES --output_dir ffhq256_deca.lmdb --image_size 256 --use_meanshape False
For the personal photo album (we use around 20 per identity in our experiments), put all images into a folder and then align them by running:
python scripts/align.py -i PATH_TO_PERSONAL_PHOTO_ALBUM -o personal_images_aligned -s 256
Then, create a dataset by running:
python scripts/create_data.py --data_dir personal_images_aligned --output_dir personal_deca.lmdb --image_size 256 --use_meanshape True
Our 256x256 model uses eight GPUs for Stage 1 training with a batch size of 32 per GPU:
mpiexec -n 8 python scripts/train.py --latent_dim 64 --encoder_type resnet18 \
--log_dir log/stage1 --data_dir ffhq256_deca.lmdb --lr 1e-4 \
--p2_weight True --image_size 256 --batch_size 32 --max_steps 50000 \
--num_workers 8 --save_interval 5000 --stage 1
To keep the model training indefinitely, set --max_steps 0
. If you want to resume a training process, simply add --resume_checkpoint PATH_TO_THE_MODEL
.
:white_check_mark: We also provide the Stage 1 model trained by us here so that you can fast-forward to training your personalized model.
Finetune the model on your tiny personal album:
mpiexec -n 1 python scripts/train.py --latent_dim 64 --encoder_type resnet18 \
--log_dir log/stage2 --resume_checkpoint log/stage1/[MODEL_NAME].pt \
--data_dir peronsal_deca.lmdb --lr 1e-5 \
--p2_weight True --image_size 256 --batch_size 4 --max_steps 5000 \
--num_workers 8 --save_interval 5000 --stage 2
It takes around 30 minutes on a single Nvidia V100 GPU.
(New as of 10/13/2023): We released some of the stage 2 training datasets we used in our paper for research purposes only. Please download them through the Google Drive Link.
We provide a script to edit face appearance by modifying the physical buffers. Run:
python scripts/inference.py --source SOURCE_IMAGE_FILE --target TARGET_IMAGE_FILE --output_dir OUTPUT_DIR --modes light --model_path PATH_TO_MODEL --meanshape PATH_TO_MEANSHAPE --timestep_respacing ddim20
to use the physical parameters (e.g., lighting, expression, or head pose) of the target image to edit the source image.
If the issue is code-related, please open an issue here.
For questions, please also consider opening an issue as it may benefit future reader. Otherwise, email Zheng Ding at zhding@ucsd.edu.
This codebase was built upon and drew inspirations from Guided-Diffusion, DECA and Diff-AE. We thank the authors for making those repositories public.