Pytorch Implementation for the paper:
Dual Attention Networks for Visual Reference Resolution in Visual Dialog
Gi-Cheon Kang, Jaeseo Lim, and Byoung-Tak Zhang
In EMNLP 2019
If you use this code in your published research, please consider citing:
@inproceedings{kang2019dual,
title={Dual Attention Networks for Visual Reference Resolution in Visual Dialog},
author={Kang, Gi-Cheon and Lim, Jaeseo and Zhang, Byoung-Tak},
booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
pages = {2024--2033},
year={2019}
}
This starter code is implemented using PyTorch v0.3.1 with CUDA 8 and CuDNN 7.
It is recommended to set up this source code using Anaconda or Miniconda.
git clone https://github.com/gicheonkang/DAN-VisDial
conda create -n dan_visdial python=3.6
# activate the environment and install all dependencies
conda activate dan_visdial
cd dan-visdial/
pip install -r requirements.txt
We used the Faster-RCNN pre-trained with Visual Genome as image features. Download the image features below, and put each feature under $PROJECT_ROOT/data/{SPLIT_NAME}_feature
directory. We need image_id
to RCNN bounding box index file ({SPLIT_NAME}_imgid2idx.pkl
) because the number of bounding box per image is not fixed (ranging from 10 to 100).
train_btmup_f.hdf5
: Bottom-up features of 10 to 100 proposals from images of train
split (32GB).train_imgid2idx.pkl
: image_id
to bbox index file for train
split val_btmup_f.hdf5
: Bottom-up features of 10 to 100 proposals from images of validation
split (0.5GB).val_imgid2idx.pkl
: image_id
to bbox index file for val
splittest_btmup_f.hdf5
: Bottom-up features of 10 to 100 proposals from images of test
split (2GB).test_imgid2idx.pkl
: image_id
to bbox index file for test
splitDownload the GloVe pretrained word vectors from here, and keep glove.6B.300d.txt
under $PROJECT_ROOT/data/glove
directory.
# data preprocessing
cd DAN-VisDial/data/
python prepro.py
# Word embedding vector initialization (GloVe)
cd ../utils
python utils.py
Simple run
python train.py
By default, our model save model checkpoints at every epoch. You can change it by using -save_step
option.
Logging data checkpoints/start/time/log.txt
shows epoch, loss, and learning rate.
Evaluation of a trained model checkpoint can be evaluated as follows:
python evaluate.py -load_path /path/to/.pth -split val
Validation scores can be checked in offline setting. But if you want to check the test split
score, you have to submit a json file to online evaluation server. You can make json format with -save_ranks=True
option.
We provide the pre-trained model reported as the best single model in the paper.
To reproduce the results reported in the paper, please run the command below and submit the json file to online evaluation server.
python evaluate.py -load_path /path/to/dan_disc_epoch_12.pth -split test -use_gt False -save_ranks True
Performance on v1.0 test-std
(trained on v1.0
train):
Model | NDCG | MRR | R@1 | R@5 | R@10 | Mean |
---|---|---|---|---|---|---|
DAN | 0.5759 | 0.6320 | 49.63 | 79.75 | 89.35 | 4.30 |
MIT License
This work was partly supported by the Korea government (2015-0-00310-SW.StarLab, 2017-0-01772-VTT, 2018-0-00622-RMI, 2019-0-01367-BabyMind, 10060086-RISF, P0006720-GENKO), and the ICT at Seoul National University.