Temporally-language-grounding
A Pytorch implemention for some state-of-the-art models for "Temporally language grounding in untrimmed videos"
Requirements
- Python 2.7
- Pytorch 0.4.1
- matplotlib
- The code is for Charades-STA dataset.
Three Models for this task
Supervised Learning based methods
- TALL: Temporal Activity Localization via Language Query
- MAC: MAC: Mining Activity Concepts for Language-based Temporal Localization.
Reinforcement Learning based method
- A2C: Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos.
Performance
Methods |
R@1, IoU0.7 |
R@1, IoU0.5 |
R@5, IoU0.7 |
R@5, IoU0.5 |
TALL |
8.63 |
24.09 |
29.33 |
59.60 |
MAC |
12.31 |
29.68 |
37.31 |
64.14 |
A2C |
14.25 |
32.66 |
None |
None |
Features Download
Training and Testing
Training and Testing for TALL, run
python main_charades_SL.py --model TALL
Training and Testing for MAC, run
python main_charades_SL.py --model MAC
Training and Testing for A2C, run
python main_charades_RL.py
Acknowledgements
Thanks the original TALL, MAC and awesome PyTorch team.