XinyuSun / SVFormer

Fork from SVFormer
Other
1 stars 0 forks source link

SVFormer: Semi-supervised Video Transformer for Action Recognition

This is the official implementation of the paper SVFormer

@article{svformer,
  title={SVFormer: Semi-supervised Video Transformer for Action Recognition},
  author={Zhen Xing, Qi Dai, Han Hu, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang},
  journal={arXiv preprint arXiv:2211.13222},
  year={2022}
}

Installation

We tested the released code with the following conda environment

conda create -n svformer python=3.7
conda activate svformer
bash env.sh

Data Preparation

We expect that --train_list_path and --val_list_path command line arguments to be a data list file of the following format

<path_1> <label_1>
<path_2> <label_2>
...
<path_n> <label_n>

where <path_i> points to a video file, and <label_i> is an integer between 0 and num_classes - 1. --num_classes should also be specified in the command line argument.

Additionally, <path_i> might be a relative path when --data_root is specified, and the actual path will be relative to the path passed as --data_root.

We provide example as list_hmdb_40.

Train script of SVFormer-B at Kinetic-400 1% setting

bash train.sh

Main Results in paper

This is an original-implementation for open-source use. We are still re-running some models, and their scripts, checkpoints will be released later. In the following table we report the accuracy in original paper.

Backbone UCF101-1% UCF101-10% Kinetic400-1% Kinetic400-10%
SVFormer-S 31.4 79.1 32.6 61.6
SVFormer-B 46.3 86.7 49.1 69.4
Backbone HMDB51-40% HMDB51-50% HMDB51-60%
SVFormer-S 56.2 58.2 59.7
SVFormer-B 61.6 64.4 68.2

Acknowledgements

Our code is modified from TimeSformer. Thanks for their awesome work!