Oytun Ulutan*, A S M Iftekhar*, B S Manjunath.
Official repository of our CVPR 2020 paper.
If you find this work useful, please consider our paper to cite:
@InProceedings{Ulutan_2020_CVPR,
author = {Ulutan, Oytun and Iftekhar, A S M and Manjunath, B. S.},
title = {VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
Method | mAP (Scenario 1) |
---|---|
InteractNet | 40.0 |
Kolesnikov et al. | 41.0 |
GPNN | 44.0 |
iCAN | 45.3 |
Li et al. | 47.8 |
VSGNet | 51.8 |
Object Detector Pre-trained on COCO | Method | mAP (Full) | mAP (Rare) | mAP (None-Rare) |
---|---|---|---|---|
HO-RCNN | 7.81 | 5.37 | 8.54 | |
InteractNet | 9.94 | 7.16 | 10.77 | |
GPNN | 10.61 | 7.78 | 11.45 | |
iCAN | 14.84 | 10.45 | 16.15 | |
Li et al. | 17.03 | 13.42 | 18.11 | |
VSGNet | 19.8 | 16.05 | 20.91 |
Object Detector Fine-Tuned on HICO
We use the object detection results from DRG. | Method | mAP (Full) | mAP (Rare) | mAP (None-Rare) |
---|---|---|---|---|
UniDet | 17.58 | 11.72 | 19.33 | |
IP-Net | 19.56 | 12.79 | 21.58 | |
PPDM | 21.10 | 14.46 | 23.09 | |
Functional | 21.96 | 16.43 | 23.62 | |
VCL | 23.63 | 17.21 | 25.55 | |
ConsNet | 24.39 | 17.10 | 26.56 | |
DRG | 24.53 | 19.47 | 26.04 | |
IDN | 26.29 | 22.61 | 27.39 | |
VSGNet | 26.54 | 21.26 | 28.12 |
git clone --recursive https://github.com/ASMIftekhar/VSGNet.git
bash download_data.sh
You need to have wget and unzip packages to execute this script. Alternatively you can download the data from here. If you execute the script then there will be two folders in the directory "All_data" and "infos". This will take close to 10GB space. This contains both of the datasets and all the essential files. Also, if you just want to work with v-coco, download "All_data_vcoco" from the link.
Inside the All_data folder you will find the following subdirectories.
a.Data_vcoco: It will contain all training and validation images of v-coco inside train2014 subdirectory and all test images of v-coco inside val2014 subdirectory.
b.Annotations_vcoco: It will contain all annotations of training, validation and testing set in three json files. The annotations are taken from v-coco API and converted into our convenient format. For example, lets consider there is only one single image annotated with two verbs "smile" and "hold" along with two person and object bounding boxes. The annotation for this image will be arranged as follows:
{image_id:[{'Verbs': 'hold',
'object': {'obj_bbx': [305.84, 59.12, 362.34, 205.22]},
'person_bbx': [0.0, 0.63, 441.03, 368.86]},
{'Verbs': 'smile',
'object': {'obj_bbx': []},
person_bbx': [0.0, 0.63, 441.03, 368.86]}]}
c.Object_Detections_vcoco: It will contain all object detection results for v-coco.
d.v-coco: It will contain original v-coco API. This is needed for doing evaluations.
e.Data_hico: It will contain all the training images of HICO-DET inside train2015 subdirectory and all test images of HICO_DET inside test2015 subdirectory.
f.Annotations_hico: same as folder (b) but for HICO_DET dataset.
g.Object_Detections_hico: same as folder (c) but for HICO_DET dataset.
h.bad_Detections_hico: It will contain the list of images in HICO_DET dataset where our object detector fails to detect any person or object.
j.hico_infos: It will contain additional files required to run training and testing in HICO_DET.
To install all packages (preferable to run in a python2 virtual environment):
pip2 install -r requirements.txt
For HICO_DET evaluation we will use python3 environment, to install those packages (preferable to run in a python3 virtual environment):
pip3 install -r requirements3.txt
Run only compute_map.sh in a python 3 enviornment. For all other use python 2 environment.
If you do not wish to move "All_data" folder from the main directory then you dont need to do anything else to setup the repo. Otherwise you need to run setup.py with the location of All_data. If you put it in /media/ssd2 with a new name of "data" then you need to execute the following command:
python2 setup.py -d /media/ssd2/data/
To download the pre-trained models for the results reported in the paper:
bash download_res.sh
This will store the model for v-coco in 'soa_paper' folder and the model for HICO_DET in 'soa_paper_hico'. Alternatively you can download the models from here.
To store the best result in v-coco format run(inside "scripts/"):
CUDA_VISIBLE_DEVICES=0 python2 main.py -fw soa_paper -ba 8 -r t -i t
You can use as many gpus as you wish. Just add the necessary gpu ids in the given command.
The outputs that will be shown in the console is basically Average Precision in test set without considering bounding boxes.
To see the results in original v-coco scheme:
python2 calculate_map_vcoco.py -fw soa_paper -sa 34 -t test
To store the best result in HICO_DET format run (inside "scripts_hico/"):
CUDA_VISIBLE_DEVICES=0 python2 main.py -fw soa_paper_hico -ba 8 -r t -i t
You can use as many gpus as you wish. Just add the necessary gpu ids in the given command.
The outputs that will be shown in the console is basically Average Precision in test set without considering bounding boxes.
To see the results in original HICO_DET scheme run (inside "scripts_hico/HICO_eval/")
bash compute_map.sh soa_paper_hico 20
The evaluation code has been adapted from the No-Frills repository.Here, 20 indicates the number of cpu cores to be used for evaluation, this can be changed to any number based on the system.
To train the model from scratch (inside "scripts/"):
CUDA_VISIBLE_DEVICES=0 python2 main.py -fw new_test -ba 8 -l 0.001 -e 80 -sa 20
Flags description:
-fw: Name of the folder in which the result will be stored.
-ba: Batch size.
-l: Learning rate.
-e: Number of epochs.
-sa: After how many epochs the model would be saved, remember by default for every epoch the best model will be saved. If someone wants to store the model at a particular epoch then this flag should be used.
To understand the flags more please consult main.py. The given example is a typical hyperparameter settings. The model converges normally within 40 epochs. Again,you can use as many gpus as you wish. Just add the necessary gpu ids in the given command. After running the model, to store the results in v-coco format (inside "scripts/"):
CUDA_VISIBLE_DEVICES=0 python2 main.py -fw new_test -ba 8 -r t -i t
Lets consider the best result is achieved at 30th epoch then to evaluate the result in original V-COCO scheme(inside "scripts/"):
python2 calculate_map_vcoco.py -fw new_test -sa 30 -t test
To train the model from scratch (inside "scripts_hico/"):
CUDA_VISIBLE_DEVICES=0 python2 main.py -fw new_test -ba 8 -l 0.001 -e 80 -sa 20
The flags are same as v-coco. The model converges normally within 30 epochs. Again,you can use as many gpus as you wish. Just add the necessary gpu ids in the given command. We have used 4 2080Tis to train HICO_DET with a batch size of 8 per gpu. It takes around 40 minutes per epoch.
After running the model, to store the results in HICO_DET format (inside "scripts_hico/"):
CUDA_VISIBLE_DEVICES=0 python2 main.py -fw new_test -ba 8 -r t -i t
To evaluate the result in original HICO_DET scheme (inside "scripts_hico/HICO_eval/"):
bash compute_map.sh new_test 20
Please contact A S M Iftekhar (iftekhar@ucsb.edu) for any queries.