Temporally-language-grounding

A Pytorch implemention for some state-of-the-art models for "Temporally language grounding in untrimmed videos"

Requirements

TALL: Temporal Activity Localization via Language Query
MAC: MAC: Mining Activity Concepts for Language-based Temporal Localization.
Reinforcement Learning based method
A2C: Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos.

Methods	R@1, IoU0.7	R@1, IoU0.5	R@5, IoU0.7	R@5, IoU0.5
TALL	8.63	24.09	29.33	59.60
MAC	12.31	29.68	37.31	64.14
A2C	14.25	32.66	None	None

Training and Testing for TALL, run

python main_charades_SL.py --model TALL

Training and Testing for MAC, run

python main_charades_SL.py --model MAC

Training and Testing for A2C, run

python main_charades_RL.py

Thanks the original TALL, MAC and awesome PyTorch team.