RVL-BERT

This repository accompanies our IEEE Access paper "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" and contains validation experiments code and the models on the SpatialSense and the VRD dataset.

Image of RVL-BERT architecture

Installation

This project is constructed with Python 3.6, PyTorch 1.1.0 and CUDA 9.0 and largely based on VL-BERT.

Please follow the original instruction to install an conda environment.

Dataset

SpatialSense

Download the SpatialSense dataset here.
Put the files under $RVL_BERT_ROOT/data/spasen and unzip the images.tar.gz as images/ there. Ensure there're two folders (flickr/ and nyu) below $RVL_BERT_ROOT/data/spasen/images/.

VRD

Download the VRD dataset: images (Backup: download sg_dataset.zip from Baidu) and annotations
Put the sg_train_images/ and sg_test_images/ folders under $RVL_BERT_ROOT/data/vrd/images.
Put all .json files under $RVL_BERT_ROOT/data/vrd/.

Checkpoints & Pretrained Weights

Common

Download the pretrained weights here and put the pretrained_model/ folder under $RVL_BERT_ROOT/model/.

SpatialSense

Download the trained checkpoint here and put the .model file under $RVL_BERT_ROOT/checkpoints/spasen/.

VRD

Download the trained checkpoints and put the .model files under $RVL_BERT_ROOT/checkpoints/vrd/:

Validation

Run the following commands to reproduce experiment results. A single GPU (NVIDIA Quadro RTX 6000, 24G memory) is used by default.

SpatialSense

Full model

python spasen/test.py --cfg cfgs/spasen/full-model.yaml --ckpt checkpoints/spasen/full-model-e44.model --bs 8 --gpus 0 --model-dir ./ --result-path results/ --result-name spasen_full_model --split test --log-dir logs

VRD

Basic model:

python vrd/test.py --cfg cfgs/vrd/basic.yaml --ckpt checkpoints/vrd/basic-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic --split test --log-dir logs/

Basic model + Visual-Linguistic Commonsense Knowledge

python vrd/test.py --cfg cfgs/vrd/basic_vl.yaml --ckpt checkpoints/vrd/basic-vl-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/

Basic model + Visual-Linguistic Commonsense Knowledge + Spatial Module

python vrd/test.py --cfg cfgs/vrd/basic_vl_s.yaml --ckpt checkpoints/vrd/basic-vl-s-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/

Full model

python vrd/test.py --cfg cfgs/vrd/basic_vl_s_m.yaml --ckpt checkpoints/vrd/basic-vl-s-m-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/

Credit

This repository is mainly based on VL-BERT.

Citation

Please cite our paper if you find the paper or our code help your research!

@ARTICLE{9387302,
  author={M. -J. {Chiou} and R. {Zimmermann} and J. {Feng}},
  journal={IEEE Access}, 
  title={Visual Relationship Detection With Visual-Linguistic Knowledge From Multimodal Representations}, 
  year={2021},
  volume={9},
  number={},
  pages={50441-50451},
  doi={10.1109/ACCESS.2021.3069041}}

coldmanck / RVL-BERT

readme

RVL-BERT

Installation

Dataset

SpatialSense

VRD

Checkpoints & Pretrained Weights

Common

SpatialSense

VRD

Validation

SpatialSense

VRD

Credit

Citation