Code release for our paper
Understanding 3D Object Articulation in Internet Videos
Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, David Fouhey
CVPR 2022
Please check the project page for more details and consider citing our paper if it is helpful:
@inproceedings{Qian22,
author = {Shengyi Qian and Linyi Jin and Chris Rockwell and Siyi Chen and David F. Fouhey},
title = {Understanding 3D Object Articulation in Internet Videos},
booktitle = {CVPR},
year = 2022
}
We are using pyenv to set up the anaconda environment. It is tested on pytorch 1.7.1, detectron2 0.4, and pytorch3d 0.4.0.
VERSION_ALIAS="articulation3d" PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install anaconda3-2020.11
# pytorch and pytorch3d
conda install -c pytorch pytorch=1.7.1 torchvision cudatoolkit=10.2
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install -c bottler nvidiacub
conda install pytorch3d -c pytorch3d
# detectron2 with pytorch 1.7, cuda 10.2
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.7/index.html
Alternatively, we have tested the anaconda virtual environment. It is tested on pytorch 1.12.1, detectron2 0.6, and pytorch3d 0.7.0.
conda create -n articulation3d python=3.8
conda activate articulation3d
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install 'git+https://github.com/facebookresearch/pytorch3d.git@stable'
pip install 'git+https://github.com/facebookresearch/detectron2.git'
To install python packages,
# other packages
pip install scikit-image matplotlib imageio plotly opencv-python
pip install mapbox-earcut
pip install numpy-quaternion
pip install imageio-ffmpeg
pip install scikit-learn
# install articulation3d
cd articulation3d
pip install -e .
Create exps
for all experiments.
mkdir exps
If necessary, download our pretrained model and put it at exps/model_final.pth
Our Internet video dataset can be downloaded on the project website. Put annotations under datasets
. It should look like articulation3d/datasets/articulation/cached_set_train.json
.
Our supplemental ScanNet dataset with synthetic humans can also be downloaded on the project website.
datasets
. It should look like articulation3d/datasets/scannet_surreal/cached_set_train.json
.To run the model and temporal optimization on a video,
python tools/inference.py --config config/config.yaml --input example.mp4 --output output
To save the 3d model, add --save-obj
and --webvis
flags,
python tools/inference.py --config config/config.yaml --input example.mp4 --output output --save-obj --webvis
Our training consists of three stages.
In the first stage, we train the bounding box on Internet videos.
python tools/train_net.py --config-file config/step1_bbox.yaml
In the second stage, we train articulation axis on Internet videos while freezing the backbone.
python tools/train_net.py --config-file config/step2_axis.yaml
In the final stage, we train the plane head on ScanNet images.
python tools/train_net.py --config-file config/step3_plane.yaml
For evaluation, run
python tools/opt_arti.py --config-file config/config.yaml --input <pth_file> --output output
We reuse the codebase of SparsePlanes and Mesh R-CNN.