This repository is the source code for the paper named Delving Deeper into the Decoder for Video Captioning.
The paper has been accepted by ECAI 2020. The encoder-decoder framework is the most popular paradigm for video captioning task. There still exist some non-negligible problems in the decoder of a video captioning model. We propose three methods to improve the performance of the model.
It is demonstrated in the experiments of MSVD and MSR-VTT datasets that our model has achieved the best results evaluated by BLEU, CIDEr, METEOR and ROUGE-L metrics with significant gains of up to 11.7% on MSVD and 5% on MSR-VTT compared with the previous state-of-the-art models.
If you need more information about how to generate training, validating and testing data for the datasets, please refer to Semantics-AssistedVideoCaptioning.
cd path_to_directory_of_model; mkdir saves
run_model.sh
is used for training or testing models.
Specify the GPU you want to use by modifying CUDA_VISIBLE_DEVICES
value. name
will be used in the name of saved model during training. Specify the needed data paths by modifying corpus
, ecores
, tag
and ref
values. test
refers to the path of the saved model which is to be tested. Do not give a parameter to test
if you want to train a model.bash run_model.sh
for training or testing.@article{chen2020delving,
title={Delving Deeper into the Decoder for Video Captioning},
author={Haoran Chen and Jianmin Li and Xiaolin Hu},
journal={CoRR},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2001.05614},
eprint={2001.05614},
year={2020}
}