This repository accompanies our IEEE Access paper "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" and contains validation experiments code and the models on the SpatialSense and the VRD dataset.
This project is constructed with Python 3.6, PyTorch 1.1.0 and CUDA 9.0 and largely based on VL-BERT.
Please follow the original instruction to install an conda environment.
$RVL_BERT_ROOT/data/spasen
and unzip the images.tar.gz
as images/
there. Ensure there're two folders (flickr/
and nyu
) below $RVL_BERT_ROOT/data/spasen/images/
.sg_dataset.zip
from Baidu) and annotationssg_train_images/
and sg_test_images/
folders under $RVL_BERT_ROOT/data/vrd/images
..json
files under $RVL_BERT_ROOT/data/vrd/
.Download the pretrained weights here and put the pretrained_model/
folder under $RVL_BERT_ROOT/model/
.
Download the trained checkpoint here and put the .model
file under $RVL_BERT_ROOT/checkpoints/spasen/
.
Download the trained checkpoints and put the .model
files under $RVL_BERT_ROOT/checkpoints/vrd/
:
Run the following commands to reproduce experiment results. A single GPU (NVIDIA Quadro RTX 6000, 24G memory) is used by default.
python spasen/test.py --cfg cfgs/spasen/full-model.yaml --ckpt checkpoints/spasen/full-model-e44.model --bs 8 --gpus 0 --model-dir ./ --result-path results/ --result-name spasen_full_model --split test --log-dir logs
Basic model:
python vrd/test.py --cfg cfgs/vrd/basic.yaml --ckpt checkpoints/vrd/basic-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic --split test --log-dir logs/
Basic model + Visual-Linguistic Commonsense Knowledge
python vrd/test.py --cfg cfgs/vrd/basic_vl.yaml --ckpt checkpoints/vrd/basic-vl-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/
Basic model + Visual-Linguistic Commonsense Knowledge + Spatial Module
python vrd/test.py --cfg cfgs/vrd/basic_vl_s.yaml --ckpt checkpoints/vrd/basic-vl-s-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/
Full model
python vrd/test.py --cfg cfgs/vrd/basic_vl_s_m.yaml --ckpt checkpoints/vrd/basic-vl-s-m-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/
This repository is mainly based on VL-BERT.
Please cite our paper if you find the paper or our code help your research!
@ARTICLE{9387302,
author={M. -J. {Chiou} and R. {Zimmermann} and J. {Feng}},
journal={IEEE Access},
title={Visual Relationship Detection With Visual-Linguistic Knowledge From Multimodal Representations},
year={2021},
volume={9},
number={},
pages={50441-50451},
doi={10.1109/ACCESS.2021.3069041}}