Video-QAP (NAACL21)

Video Question Answering with Phrases via Semantic Roles
Arka Sadhu, Kan Chen Ram Nevatia
NAACL 2021

Video Question Answering has been studied through the lens of N-way phrase classification. While this eases evaluation, it severely limits its application in the wild. Here, we require the model to generate the answer and we propose a novel evaluation metric using relative scoring and contrastive scoring. We further create ActivityNet-SRL-QA and Charades-SRL-QA.

Quickstart

Quick Start

Clone repo:

git clone https://github.com/TheShadow29/Video-QAP
cd Video-QAP
export ROOT=$(pwd)

Setup a new conda environment using the file vidqap_env.yml file provided. Please refer to Miniconda for details on installing conda.

MINICONDA_ROOT=[to your Miniconda/Anaconda root directory]
conda env create -f vidqap_env.yml --prefix $MINICONDA_ROOT/envs/vidqap_pyt
conda activate vidqap_pyt

See instructions to install fairseq INSTALL.md
To download the datasets ActivityNet-SRL-QA and Charades-SRL-QA see DATA.md

Training

Configuration files are insider configs

cd $ROOT
python code/main_dist.py "vogqap_asrlqa" --ds_to_use='asrl_qa' --mdl.name='vog_qa' --train.bs=4 --train.epochs=10 --train.lr=1e-4

Use one of the models lqa, mtx_qa, butd_qa, vog_qa

Evaluation

Main evaluation file is vidqa_code/eval_fn_vidqap.py. You can use this as a stand-alone file for a separate dataset as well.

cd $ROOT
python vidqa_code/eval_fn_vidqap.py --pred_file=... --ds_to_use='asrl_qa' --split_type='valid' --met_keys='meteor,rouge,bert_score'

ToDo:

[ ] Add more documentation on how to run the models
[ ] Add pre-trained model weights.
[ ] Support dataset creation for new caption dataset.

Acknowledgements:

We thank:

@LuoweiZhou: for their codebase on GVD (https://github.com/facebookresearch/grounded-video-description) along with the extracted features for ActivityNet.
@antoine77340 for their codebase on S3D pretrained on Howto100M (https://github.com/antoine77340/S3D_HowTo100M) used for feature extraction on Charades.
allennlp for providing demo and pre-trained model for SRL.
fairseq for sequence generation implementation and transformer encoder decoder models.

Citation

@inproceedings{Sadhu2021VideoQA,
  title={Video Question Answering with Phrases via Semantic Roles},
  author={Arka Sadhu and Kan Chen and R. Nevatia},
  booktitle={NAACL},
  year={2021}
}

TheShadow29 / Video-QAP

readme