Code for SAAT from "Syntax-Aware Action Targeting for Video Captioning" (Accepted to CVPR 2020). The implementation is based on "Consensus-based Sequence Training for Video Captioning".
(Check out the coco-caption
and cider
projects into your working directory)
Data can be downloaded here (1.6GB). This folder contains:
val_videodatainfo.json
is a symbolic link to train_videodatainfo.json
)make -f SpecifiedMakefile test [options]
Please refer to the Makefile (and opts_svo.py file) for the set of available train/test options. For example, to reproduce the reported result
make -f Makefile_msrvtt_svo test GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG LAMBDA=20
To train the model using XE loss
make -f Makefile_msrvtt_svo train GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG MAX_EPOCH=100 LAMBDA=20
If you want to change the input features, modify the FEATS
variable in above commands.
@InProceedings{Zheng_2020_CVPR,
author = {Zheng, Qi and Wang, Chaoyue and Tao, Dacheng},
title = {Syntax-Aware Action Targeting for Video Captioning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}