TubeR: Tubelet Transformer for Video Action Detection

This repo contains the supported code to reproduce spatio-temporal action detection results of TubeR: Tubelet Transformer for Video Action Detection.

Updates

08/08/2022 Initial commits

Results and Models

AVA 2.1 Dataset

Backbone	Pretrain	#view	mAP	FLOPs	config	model
CSN-50	Kinetics-400	1 view	27.2	78G	config	S3
CSN-50 (with long-term context)	Kinetics-400	1 view	28.8	78G	config	Comming soon
CSN-152	Kinetics-400+IG65M	1 view	29.7	120G	config	S3
CSN-152 (with long-term context)	Kinetics-400+IG65M	1 view	31.7	120G	config	Comming soon

AVA 2.2 Dataset

Backbone	Pretrain	#view	mAP	FLOPs	config	model
CSN-152	Kinetics-400+IG65M	1 view	31.1	120G	config	S3
CSN-152 (with long-term context)	Kinetics-400+IG65M	1 view	33.4	120G	config	Comming soon

JHMDB Dataset

Backbone	#view	mAP@0.2	mAP@0.5	config	model
CSN-152	1 view	87.4	82.3	config	S3

Usage

The project is developed based on GluonCV-torch. Please refer to tutorial for details.

Dependency

The project is tested working on:

Torch 1.12 + CUDA 11.3
timm==0.4.5
tensorboardX

Dataset

Please download the asset.zip and unzip them at ./datasets.

[AVA] Please refer to DATASET.md for AVA dataset downloading and pre-processing. [JHMDB] Please refer to JHMDB for JHMDB dataset and Dataset Section for UCF dataset. You also can refer to ACT-Detector to prepare the two datasets.

Inference

To run inference, first modify the config file:

set the correct WORLD_SIZE, GPU_WORLD_SIZE, DIST_URL, WOLRD_URLS based on experiment setup.
set the LABEL_PATH, ANNO_PATH, DATA_PATH to your local directory accordingly.
Download the pre-trained model and set PRETRAINED_PATH to model path.
make sure LOAD and LOAD_FC are set to True

Then run:

# run testing
python3  eval_tuber_ava.py <CONFIG_FILE> 

# for example, to evaluate ava from scratch, run:
python3 eval_tuber_ava.py configuration/TubeR_CSN152_AVA21.yaml

Training

To train TubeR from scratch, first modify the configfile:

set the correct WORLD_SIZE, GPU_WORLD_SIZE, DIST_URL, WOLRD_URLS based on experiment setup.
set the LABEL_PATH, ANNO_PATH, DATA_PATH to your local directory accordingly.
Download the pre-trained feature backbone and transformer weights and set PRETRAIN_BACKBONE_DIR (CSN50, CSN152), PRETRAIN_TRANSFORMER_DIR (DETR) accordingly.
make sure LOAD and LOAD_FC are set to False