This repository contains the official implementation of the paper "Category Query Learning for Human-Object Interaction Classification" (CVPR 2023).
Overview of our method
Unlike most previous HOI methods that focus on learning better human-object features, we propose a novel and complementary approach called category query learning. Such queries are explicitly associated to interaction categories, converted to image specific category representation via a transformer decoder, and learnt via an auxiliary image-level classification task.
You can install the python libraries with this command:
pip install -r requirements.txt
Actually torch>=1.5.1
is probably fine, but we have not tested all versions.
The images in HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz
) to the data
directory.
The annotation files can be downloaded from here. Note the training annotation is slightly different than previous versions as some HOI instances with the same human and object are merged as one, and this may affect the performance by about 0.1 mAP. The downloaded annotation files should be placed as follows.
qpic
|─ data
│ └─ hico_20160224_det
| |─ annotations
| | |─ trainval_hico.json
| | |─ test_hico.json
| | └─ corre_hico.npy
: :
First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json
. Next, download the prior file prior.pickle
from here. Place the files and make directories as follows.
qpic
|─ data
│ └─ v-coco
| |─ data
| | |─ instances_vcoco_all_2014.json
| | :
| |─ prior.pickle
| |─ images
| | |─ train2014
| | | |─ COCO_train2014_000000000009.jpg
| | | :
| | └─ val2014
| | |─ COCO_val2014_000000000042.jpg
| | :
| |─ annotations
: :
Following most previous methods, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.
PYTHONPATH=data/v-coco \
python convert_vcoco_annotations.py \
--load_path data/v-coco/data \
--prior_path data/v-coco/prior.pickle \
--save_path data/v-coco/annotations
Note that only Python2 can be used for this conversion because vsrl_utils.py
in the v-coco repository shows a error with Python3.
V-COCO annotations with the HOIA format, corre_vcoco.npy
, test_vcoco.json
, and trainval_vcoco.json
will be generated to annotations
directory.
We also use the COCO-pretrained parameters of DETR, same as previous methods. You can download the DETR parameters from here for the ResNet50 backbone, and here for the ResNet101 backbone. For HICO-DET, convert the parameters with the following command.
python convert_parameters.py \
--load_path params/detr-r50-e632da11.pth \
--save_path params/detr-r50-pre-hico.pth
For V-COCO, convert the pre-trained parameters with the following command.
python convert_parameters.py \
--load_path params/detr-r50-e632da11.pth \
--save_path params/detr-r50-pre-vcoco.pth \
--dataset vcoco
After the preparation, you can start the training with the following command.
For the HICO-DET training.
bash cfg/hicodet.sh
See this script for detail about training configurations on HICO-DET.
For the V-COCO training.
bash cfg/vcoco.sh
See this script for detail about training configurations on V-COCO.
The evaluation is automatically conducted at the end of each epoch during the training. The results are written in the log.txt
file in the output directory. However, they are not evaluated by the standard evaluation tools of HICO-DET and V-COCO. For this, additional steps need to be performed, which we describe below.
On HICO-DET, you can also conduct the evaluation with trained parameters as follows.
bash cfg/eval_hicodet.sh
Then we perform the official evaluation on HICO-DET. For this, we follow the evaluation steps introduced in PPDM.
For the official evaluation of V-COCO, a pickle file of detection results have to be generated. You can generate the file as follows.
bash cfg/eval_vcoco.sh
Then we perform the official evaluation steps on V-COCO with this pickle file using the official evaluation toolkit. You can refer to CDN for detailed steps.
HICO-DET:
Full (D) | Rare (D) | Non-rare (D) | Full(KO) | Rare (KO) | Non-rare (KO) | Checkpoint | Config | Training log | |
---|---|---|---|---|---|---|---|---|---|
QPIC+CQL | 31.07 | 25.21 | 32.82 | 33.74 | 28.03 | 35.45 | ckpt | cfg | log |
QPIC+CQL* | 31.72 | 27.40 | 33.01 | 34.25 | 28.77 | 35.89 | ckpt | cfg | log |
D: Default, KO: Known object. *: Additional trick is applied in this setting. Please see the config for more details.
Note that the results in the training log are not exactly equal to the standard evaluation results. Please see the evaluation part above.
V-COCO:
Scenario 1 | Scenario 2 | Checkpoint | Config | Training log | |
---|---|---|---|---|---|
QPIC+CQL | 63.57 | 65.70 | ckpt | cfg | log |
Please kindly consider citing our paper if it helps your research.
@inproceedings{xie2023category,
title={Category Query Learning for Human-Object Interaction Classification},
author={Xie, Chi and Zeng, Fangao and Hu, Yue and Liang, Shuang and Wei, Yichen},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={15275--15284},
year={2023}
}
Some of this work's code is built upon several prior works including DETR, QPIC and GEN-VLKT.