Visual Relationship Detection with Deep Structural Ranking

The code is written in python and pytorch (0.2.0) [torch-0.2.0.post3].

Since I have graduated, I may not be able to respond to the issues in time. Thanks for your understanding.

Clone the repo

git clone git@github.com:GriffinLiang/vrd-dsr.git
git submodule update --recursive

OR
git clone --recursive git@github.com:GriffinLiang/vrd-dsr.git

Data Preparation

Download VRD Dateset (image, annotation, backup) and put it in the path ~/data. Replace ~/data/sg_dataset/sg_test_images/4392556686_44d71ff5a0_o.gif with ~/data/vrd/4392556686_44d71ff5a0_o.jpg
Download VGG16 trained on ImageNet and put it in the path ~/data
Download the meta data (so_prior.pkl) [Baidu YUN] or [Google Drive] and put it in ~/data/vrd
Download visual genome data (vg.zip) [Baidu YUN] or [Google Drive] and put it in ~/data/vg
Word2vec representations of the subject and object categories are provided in this project. If you want to use the model for novel categories, please refer to this blog.

The folder should be:

├── sg_dataset
│   ├── sg_test_images
│   ├── sg_train_images
│   
├── VGG_imagenet.npy
└── vrd
    ├── gt.mat
    ├── obj.txt
    ├── params_emb.pkl
    ├── proposal.pkl
    ├── rel.txt
    ├── so_prior.pkl
    ├── test.pkl
    ├── train.pkl
    └── zeroShot.mat

Data format

train.pkl or test.pkl
- python list
- each item is a dictionary with the following keys: {'img_path', 'classes', 'boxes', 'ix1', 'ix2', 'rel_classes'}
  - 'classes' and 'boxes' describe the objects contained in a single image.
  - 'ix1': subject index.
  - 'ix2': object index.
  - 'rel_classes': relationship for a subject-object pair.

proposal.pkl

    >>> proposals.keys()
    ['confs', 'boxes', 'cls']
    >>> proposals['confs'].shape, proposals['boxes'].shape, proposals['cls'].shape
    ((1000,), (1000,), (1000,))
    >>> proposals['confs'][0].shape, proposals['boxes'][0].shape, proposals['cls'][0].shape
    ((9, 1), (9, 4), (9, 1))

Prerequisites

Python 2.7
Pytorch 0.2.0
opencv-python
tabulate
CUDA 8.0 or higher

Installation

Edit ~/lib/make.sh to set CUDA_PATH and choose your -arch option to match your GPU.

GPU model Architecture

TitanX (Maxwell/Pascal) sm_52

GTX 960M sm_50

GTX 1080 (Ti) sm_61

Grid K520 (AWS g2.2xlarge) sm_30

Tesla K80 (AWS p2.xlarge) sm_37
Build the Cython modules for the roi_pooling layer and choose the right -arch to compile the cuda code refering to https://github.com/ruotianluo/pytorch-faster-rcnn.
```
cd lib
./make.sh
```

GPU model	Architecture
TitanX (Maxwell/Pascal)	sm_52
GTX 960M	sm_50
GTX 1080 (Ti)	sm_61
Grid K520 (AWS g2.2xlarge)	sm_30
Tesla K80 (AWS p2.xlarge)	sm_37

Demo

Predicate demo: demo.py->pre_demo()
- Download epoch_4_checkpoint.pth.tar [Baidu YUN] or [Google Drive] and put it in ~/model
Relationship demo: demo.py->vrd_demo().
- Install faster-rcnn according to README file. (Pay attention to ~/lib/make.sh. Set CUDA_PATH by choosing your -arch option to match your GPU.)
- Download faster_rcnn_1_20_7559.pth [Baidu YUN] or [Google Drive] and put it in ~/model
- Thanks Jianwei Yang and Jiasen Lu for the detector codes!

Train

Model Structure

Model Structure

Run

cd tool
CUDA_VISIBLE_DEVICES=0 python train.py --dataset vrd --name VRD_RANK --epochs 10 --print-freq 500 --model_type RANK_IM

You can set the parser argument -no_so to discard separate bbox visual input and --no_obj to discard semantic cue.

This project contains all training and testing code for predicate detection. For relationship detection, our proposed pipeline contains two stages. The first stage is object detection and not included in this project. I am trying to release the code ASAP. Before that, you may refer to some other projects such as pytorch-faster-rcnn and faster-rcnn.pytorch.

Citation

If you use this code, please cite the following paper(s):

@article{liang2018Visual,
    title={Visual Relationship Detection with Deep Structural Ranking},
    author={Liang, Kongming and Guo, Yuhong and Chang, Hong and Chen, Xilin},
    booktitle={AAAI Conference on Artificial Intelligence},
    year={2018}
}

License

The source codes and processed data can only be used for none-commercial purpose.

GriffinLiang / vrd-dsr

readme