Yufei Ye, Xueting Li, Abhinav Gupta, Shalini De Mello, Stan Birchfield, Jiaming Song, Shubham Tulsiani, Sifei Liu
in CVPR2023
Tl;dr: Given a single RGB image of an object, hallucinate plausible ways of human interacting with it.
[Project Page] [Video] [Arxiv] [Data Generation]
See install.md
python inference.py data.data_dir='docs/demo/*.*g' test_num=3
Inference script first synthesizes $test_num
HOI images in batch and then extract 3D hand pose.
Input | Synthesized HOI images | Extracted 3D Hand Pose |
---|---|---|
The script takes in the layout parameter of the $index
-th example predicted from inference.py
, and smoothly interpolates the HOI synthesis to the horizontally flipped parameters. To run demo,
python -m scripts.interpolate dir=docs/demo_inter
This should gives results similar to:
Input | Interpolated Layouts | Output |
---|---|---|
The following command runs guided generation with keypoints in docs/demo_kpts
python inference.py mode=hijack data.data_dir='docs/demo_kpts/*.png' test_name=hijack
This should gives results similar to:
Input 1 | Output 1 | Input 2 | Output 2 |
---|---|---|---|
We provide the script to generate the HO3Pair dataset. Please see preprocess/
.
${environment.pretrain}/glide/base_inpaint.pt
specified in configs/model/layout.yaml:resume_ckpt
python -m models.base -m --config-name=train \
expname=reproduce/\${model.module} \
model=layout
${environment.pretrain}/glide/base_inpaint.pt
specified in configs/model/content_glide.yaml:resume_ckpt
python -m models.base -m --config-name=train \
expname=reproduce/\${model.module} \
model=content_glide
${environment.pretrain}/stable/inpaint.ckpt
specified in configs/model/content_ldm.yaml:resume_ckpt
python -m models.base -m --config-name=train \
expname=reproduce/\${model.module} \
model=content_ldm
Per-category HOI4D instance splits (was not used in the paper), test images on HOI4D and EPIC-KITCHENS(VISOR) can be downloaded here.
This project is licensed under CC-BY-NC-SA-4.0. Redistribution and use should follow this license.
Affordance Diffusion leverages many amazing open-sources shared in research community:
ldm/
are modified from this repo)glide_text2im/
are modified from this repo)If you use find this work helpful, please consider citing:
@inproceedings{ye2023affordance,
title={Affordance Diffusion: Synthesizing Hand-Object Interactions},
author={Yufei Ye and Xueting Li and Abhinav Gupta
and Shalini De Mello and Stan Birchfield and Jiaming Song
and Shubham Tulsiani and Sifei Liu},
year={2023},
booktitle ={CVPR},
}