Example results from the VisDial v1.0 validation dataset.
This is a PyTorch implementation for Learning Dual Encoding Model for Adaptive Visual Understanding in Visual Dialogue, IEEE Transactions on Image Processing.
If you use this code in your research, please consider citing:
@article{yu2020learning,
title = {Learning Dual Encoding Model for Adaptive Visual Understanding in Visual Dialogue},
author = {Yu, Jing and Jiang, Xiaoze and Qin, Zengchang and Zhang, Weifeng and Hu, Yue and Wu, Qi},
journal = {IEEE Transactions on Image Processing},
volume={30},
pages={220--233},
year = {2020}
}
A previous version of our dual encoding model was published in AAAI 2020: DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue. [Paper] [Code]
This code is implemented using PyTorch v1.0, and provides out of the box support with CUDA 9 and CuDNN 7.
conda create -n visdialch python=3.6
conda activate visdialch # activate the environment and install all dependencies
cd DualVD/
pip install -r requirements.txt
Train the DualVD model as:
python train.py --config-yml configs/lf_disc_faster_rcnn_x101_bs32.yml --gpu-ids 0 1 # provide more ids for multi-GPU execution other args...
The code have an --overfit
flag, which can be useful for rapid debugging. It takes a batch of 5 examples and overfits the model on them.
DualVD-LF is trained with configs/lf_disc_faster_rcnn_x101.yml.
DualVD-MN is trained with configs/mn_disc_faster_rcnn_x101.yml.
This script will save model checkpoints at every epoch as per path specified by --save-dirpath
. Refer visdialch/utils/checkpointing.py for more details on how checkpointing is managed.
Use Tensorboard for logging training progress. Recommended: execute tensorboard --logdir /path/to/save_dir --port 8008
and visit localhost:8008
in the browser.
Evaluation of a trained model checkpoint can be done as follows:
python evaluate.py --config-yml /path/to/config.yml --load-pthpath /path/to/checkpoint.pth --split val --gpu-ids 0