DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

Example results from the VisDial v1.0 validation dataset.

This is a PyTorch implementation for DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue, AAAI2020.

If you use this code in your research, please consider citing:

  title =  {DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue},
  author =  {Jiang, Xiaoze and Yu, Jing and Qin, Zengchang and Zhuang, Yingying and Zhang, Xingxing and Hu, Yue and Wu, Qi},
  year =  {2020},
  pages = {11125--11132},
  booktitle = {Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)}


This code is implemented using PyTorch v1.0, and provides out of the box support with CUDA 9 and CuDNN 7.

conda create -n visdialch python=3.6
conda activate visdialch  # activate the environment and install all dependencies
cd DualVD/
pip install -r requirements.txt


  1. Download the VisDial v1.0 dialog json files and images from here.
  2. Download the word counts file for VisDial v1.0 train split from here. They are used to build the vocabulary.
  3. Use Faster-RCNN to extract image features from here.
  4. Use Large-Scale-VRD to extract visual relation embedding from here.
  5. Use Densecap to extract local captions from here.
  6. Generate ELMo word vectors from here.
  7. Download pre-trained GloVe word vectors from here.


Train the DualVD model as:

python train.py --config-yml configs/lf_disc_faster_rcnn_x101_bs32.yml --gpu-ids 0 1 # provide more ids for multi-GPU execution other args...

The code have an --overfit flag, which can be useful for rapid debugging. It takes a batch of 5 examples and overfits the model on them.

Saving model checkpoints

This script will save model checkpoints at every epoch as per path specified by --save-dirpath. Refer visdialch/utils/checkpointing.py for more details on how checkpointing is managed.


Use Tensorboard for logging training progress. Recommended: execute tensorboard --logdir /path/to/save_dir --port 8008 and visit localhost:8008 in the browser.


Evaluation of a trained model checkpoint can be done as follows:

python evaluate.py --config-yml /path/to/config.yml --load-pthpath /path/to/checkpoint.pth --split val --gpu-ids 0
