VSGNet

VSGNet:Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

Oytun Ulutan*, A S M Iftekhar*, B S Manjunath.

Official repository of our CVPR 2020 paper.

Overview of VSGNET

Citing

If you find this work useful, please consider our paper to cite:

 @InProceedings{Ulutan_2020_CVPR,
author = {Ulutan, Oytun and Iftekhar, A S M and Manjunath, B. S.},
title = {VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Results on HICO-DET and V-COCO

Our Results on V-COCO dataset

Method	mAP (Scenario 1)
InteractNet	40.0
Kolesnikov et al.	41.0
GPNN	44.0
iCAN	45.3
Li et al.	47.8
VSGNet	51.8

Our Results on HICO-DET dataset

Object Detector Pre-trained on COCO	Method	mAP (Full)	mAP (Rare)
HO-RCNN	7.81	5.37	8.54
InteractNet	9.94	7.16	10.77
GPNN	10.61	7.78	11.45
iCAN	14.84	10.45	16.15
Li et al.	17.03	13.42	18.11
VSGNet	19.8	16.05	20.91

Object Detector Fine-Tuned on HICO

We use the object detection results from DRG.	Method	mAP (Full)	mAP (Rare)
UniDet	17.58	11.72	19.33
IP-Net	19.56	12.79	21.58
PPDM	21.10	14.46	23.09
Functional	21.96	16.43	23.62
VCL	23.63	17.21	25.55
ConsNet	24.39	17.10	26.56
DRG	24.53	19.47	26.04
IDN	26.29	22.61	27.39
VSGNet	26.54	21.26	28.12

Installation

Clone repository (recursively):

git clone --recursive https://github.com/ASMIftekhar/VSGNet.git

Download data,annotations,object detection results:
```
bash download_data.sh
```
You need to have wget and unzip packages to execute this script. Alternatively you can download the data from here. If you execute the script then there will be two folders in the directory "All_data" and "infos". This will take close to 10GB space. This contains both of the datasets and all the essential files. Also, if you just want to work with v-coco, download "All_data_vcoco" from the link.

Inside the All_data folder you will find the following subdirectories.

a.Data_vcoco: It will contain all training and validation images of v-coco inside train2014 subdirectory and all test images of v-coco inside val2014 subdirectory.

b.Annotations_vcoco: It will contain all annotations of training, validation and testing set in three json files. The annotations are taken from v-coco API and converted into our convenient format. For example, lets consider there is only one single image annotated with two verbs "smile" and "hold" along with two person and object bounding boxes. The annotation for this image will be arranged as follows:

    {image_id:[{'Verbs': 'hold',
    'object': {'obj_bbx': [305.84, 59.12, 362.34, 205.22]},
    'person_bbx': [0.0, 0.63, 441.03, 368.86]},
    {'Verbs': 'smile',
    'object': {'obj_bbx': []},
    person_bbx': [0.0, 0.63, 441.03, 368.86]}]}

c.Object_Detections_vcoco: It will contain all object detection results for v-coco.

d.v-coco: It will contain original v-coco API. This is needed for doing evaluations.

e.Data_hico: It will contain all the training images of HICO-DET inside train2015 subdirectory and all test images of HICO_DET inside test2015 subdirectory.

f.Annotations_hico: same as folder (b) but for HICO_DET dataset.

g.Object_Detections_hico: same as folder (c) but for HICO_DET dataset.

h.bad_Detections_hico: It will contain the list of images in HICO_DET dataset where our object detector fails to detect any person or object.

j.hico_infos: It will contain additional files required to run training and testing in HICO_DET.

To install all packages (preferable to run in a python2 virtual environment):
```
pip2 install -r requirements.txt
```
For HICO_DET evaluation we will use python3 environment, to install those packages (preferable to run in a python3 virtual environment):
```
pip3 install -r requirements3.txt
```
Run only compute_map.sh in a python 3 enviornment. For all other use python 2 environment.
If you do not wish to move "All_data" folder from the main directory then you dont need to do anything else to setup the repo. Otherwise you need to run setup.py with the location of All_data. If you put it in /media/ssd2 with a new name of "data" then you need to execute the following command:
```
python2 setup.py -d /media/ssd2/data/
```

Downloading the Pre-Trained Models:

To download the pre-trained models for the results reported in the paper:

bash download_res.sh

This will store the model for v-coco in 'soa_paper' folder and the model for HICO_DET in 'soa_paper_hico'. Alternatively you can download the models from here.

Evaluation in V-COCO

To store the best result in v-coco format run(inside "scripts/"):

CUDA_VISIBLE_DEVICES=0 python2 main.py -fw soa_paper -ba 8 -r t -i t

You can use as many gpus as you wish. Just add the necessary gpu ids in the given command.

The outputs that will be shown in the console is basically Average Precision in test set without considering bounding boxes.

To see the results in original v-coco scheme:

python2 calculate_map_vcoco.py -fw soa_paper -sa 34 -t test

Evaluation in HICO_DET

To store the best result in HICO_DET format run (inside "scripts_hico/"):

CUDA_VISIBLE_DEVICES=0 python2 main.py -fw soa_paper_hico -ba 8 -r t -i t

You can use as many gpus as you wish. Just add the necessary gpu ids in the given command.

The outputs that will be shown in the console is basically Average Precision in test set without considering bounding boxes.

To see the results in original HICO_DET scheme run (inside "scripts_hico/HICO_eval/")

bash compute_map.sh soa_paper_hico 20

The evaluation code has been adapted from the No-Frills repository.Here, 20 indicates the number of cpu cores to be used for evaluation, this can be changed to any number based on the system.

Training in V-COCO

To train the model from scratch (inside "scripts/"):

CUDA_VISIBLE_DEVICES=0 python2 main.py -fw new_test -ba 8 -l 0.001 -e 80 -sa 20

Flags description:

-fw: Name of the folder in which the result will be stored.

-ba: Batch size.

-l: Learning rate.

-e: Number of epochs.

-sa: After how many epochs the model would be saved, remember by default for every epoch the best model will be saved. If someone wants to store the model at a particular epoch then this flag should be used.

To understand the flags more please consult main.py. The given example is a typical hyperparameter settings. The model converges normally within 40 epochs. Again,you can use as many gpus as you wish. Just add the necessary gpu ids in the given command. After running the model, to store the results in v-coco format (inside "scripts/"):

CUDA_VISIBLE_DEVICES=0 python2 main.py -fw new_test -ba 8 -r t -i t

Lets consider the best result is achieved at 30th epoch then to evaluate the result in original V-COCO scheme(inside "scripts/"):

python2 calculate_map_vcoco.py -fw new_test -sa 30 -t test

Training in HICO_DET

To train the model from scratch (inside "scripts_hico/"):

CUDA_VISIBLE_DEVICES=0 python2 main.py -fw new_test -ba 8 -l 0.001 -e 80 -sa 20

The flags are same as v-coco. The model converges normally within 30 epochs. Again,you can use as many gpus as you wish. Just add the necessary gpu ids in the given command. We have used 4 2080Tis to train HICO_DET with a batch size of 8 per gpu. It takes around 40 minutes per epoch.
After running the model, to store the results in HICO_DET format (inside "scripts_hico/"):

CUDA_VISIBLE_DEVICES=0 python2 main.py -fw new_test -ba 8 -r t -i t

To evaluate the result in original HICO_DET scheme (inside "scripts_hico/HICO_eval/"):

bash compute_map.sh new_test 20

Please contact A S M Iftekhar (iftekhar@ucsb.edu) for any queries.

ASMIftekhar / VSGNet

readme