In IJCV 2023
Shangzhe Wu*, Tomas Jakab*, Christian Rupprecht, Andrea Vedaldi (*equal contribution)
Visual Geometry Group, University of Oxford
DOVE - Deformable Objects from VidEos. Given a collection of video clips of an object category as training data, we learn a model that predicts a textured, articulated 3D mesh from a single image of the object.
conda env create -f environment.yml
or manually:
conda install -c conda-forge matplotlib=3.3.1 opencv=3.4.2 scikit-image=0.17.2 pyyaml=5.4.1 tensorboard=2.7.0 trimesh=3.9.35 configargparse=1.2.3 einops=0.3.2 moviepy=1.0.1
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
Note: The code is tested with PyTorch 1.6.0 and CUDA 10.1.
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install -c bottler nvidiacub
conda install -c pytorch3d pytorch3d=0.3.0
or follow the instructions. The code is tested with PyTorch3D 0.3.0.
pip install lpips
The preprocessed datasets can be downloaded using the scripts in data/
:
cd data
sh download_bird_videos.sh
sh download_horse_videos.sh
sh download_toy_birds.sh
sh download_toy_birds_raw.sh
The toy_birds
dataset consists of 3D scans and real photos of 23 toy birds, which are preprocessed and used for 3D evaluation. toy_birds_raw
contains all the raw captures.
The pretrained models on birds and horses can be downloaded using the scripts in results/
, eg:
cd results/bird && sh download_pretrained_bird.sh
and
cd results/horse && sh download_pretrained_horse.sh
Check the configuration files in config/
and run, eg:
python run.py --config configs/bird/train_bird.yml --gpu 0 --num_workers 4
python run.py --config configs/bird/test_bird.yml --gpu 0 --num_workers 4
After generating the results on the bird test set (using config/bird/test_bird.yml
), check the directories and run:
python scripts/eval_mask_reprojection.py
After generating the results on the bird test set (using config/bird/test_bird_toy.yml
), check the directories and run:
python scripts/eval_3d_toy_bird.py
Note: The canonical pose may be facing either towards or away from the camera, as both are valid solutions. The current script assumes the canonical pose is facing away from the camera, hence the line 157 which rotates the mesh 180Β° to roughly align with the ground-truth scans. You might need to inspect the results and adjust accordiningly.
After generating the test results, check the directories and run:
python scripts/render_visual.py
There are multiple modes of visualization specified by render_mode
, including novel views, rotations and animations. Check the script for details.
@Article{wu2023dove,
title = {{DOVE}: Learning Deformable 3D Objects by Watching Videos},
author = {Shangzhe Wu and Tomas Jakab and Christian Rupprecht and Andrea Vedaldi},
journal = {IJCV},
year = {2023}
}