amazon-science / tubelet-transformer

This is an official implementation of TubeR: Tubelet Transformer for Video Action Detection
https://openaccess.thecvf.com/content/CVPR2022/supplemental/Zhao_TubeR_Tubelet_Transformer_CVPR_2022_supplemental.pdf
Apache License 2.0
71 stars 17 forks source link
action-detection ava jhmdb transformer tubelet-transformer tuber ucf

TubeR: Tubelet Transformer for Video Action Detection

This repo contains the supported code to reproduce spatio-temporal action detection results of TubeR: Tubelet Transformer for Video Action Detection.

Updates

08/08/2022 Initial commits

Results and Models

AVA 2.1 Dataset

Backbone Pretrain #view mAP FLOPs config model
CSN-50 Kinetics-400 1 view 27.2 78G config S3
CSN-50 (with long-term context) Kinetics-400 1 view 28.8 78G config Comming soon
CSN-152 Kinetics-400+IG65M 1 view 29.7 120G config S3
CSN-152 (with long-term context) Kinetics-400+IG65M 1 view 31.7 120G config Comming soon

AVA 2.2 Dataset

Backbone Pretrain #view mAP FLOPs config model
CSN-152 Kinetics-400+IG65M 1 view 31.1 120G config S3
CSN-152 (with long-term context) Kinetics-400+IG65M 1 view 33.4 120G config Comming soon

JHMDB Dataset

Backbone #view mAP@0.2 mAP@0.5 config model
CSN-152 1 view 87.4 82.3 config S3

Usage

The project is developed based on GluonCV-torch. Please refer to tutorial for details.

Dependency

The project is tested working on:

Dataset

Please download the asset.zip and unzip them at ./datasets.

[AVA] Please refer to DATASET.md for AVA dataset downloading and pre-processing. [JHMDB] Please refer to JHMDB for JHMDB dataset and Dataset Section for UCF dataset. You also can refer to ACT-Detector to prepare the two datasets.

Inference

To run inference, first modify the config file:

Then run:

# run testing
python3  eval_tuber_ava.py <CONFIG_FILE> 

# for example, to evaluate ava from scratch, run:
python3 eval_tuber_ava.py configuration/TubeR_CSN152_AVA21.yaml

Training

To train TubeR from scratch, first modify the configfile:

Then run:

# run training from scratch
python3  train_tuber.py <CONFIG_FILE>

# for example, to train ava from scratch, run:
python3 train_tuber_ava.py configuration/TubeR_CSN152_AVA21.yaml

TODO

[ ]Add tutorial and pre-trained weights for TubeR with long-term memory

[ ]Add weights for UCF24

Citing TubeR

@inproceedings{zhao2022tuber,
  title={TubeR: Tubelet transformer for video action detection},
  author={Zhao, Jiaojiao and Zhang, Yanyi and Li, Xinyu and Chen, Hao and Shuai, Bing and Xu, Mingze and Liu, Chunhui and Kundu, Kaustav and Xiong, Yuanjun and Modolo, Davide and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={13598--13607},
  year={2022}
}