This is the official PyTorch implementation of our VCLR paper.
@article{kuang2021vclr,
title={Video Contrastive Learning with Global Context},
author={Haofei Kuang, Yi Zhu, Zhi Zhang, Xinyu Li, Joseph Tighe, Sören Schwertfeger, Cyrill Stachniss, Mu Li},
journal={arXiv preprint arXiv:2108.02722},
year={2021}
}
conda create --name vclr python=3.7
conda activate vclr
conda install numpy scipy scikit-learn matplotlib scikit-image
pip install torch==1.7.1 torchvision==0.8.2
pip install opencv-python tqdm termcolor gcc7 ffmpeg tensorflow==1.15.2
pip install mmcv-full==1.2.7
Please refer to PREPARE_DATA to prepare the datasets.
In this work, we follow SeCo and use the pretrained weights of MoCov2 as initialization.
cd ~
git clone https://github.com/amazon-research/video-contrastive-learning.git
cd video-contrastive-learning
mkdir pretrain && cd pretrain
wget https://dl.fbaipublicfiles.com/moco/moco_checkpoints/moco_v2_200ep/moco_v2_200ep_pretrain.pth.tar
cd ..
bash shell/main_train.sh
Checkpoints will be saved to ./results
In order to evaluate the effectiveness of self-supervised learning, we conduct a linear evaluation (probing) on Kinetics400 dataset. Basically, we first extract features from the pretrained weight and then train a SVM classifier to see how the learned features perform.
bash shell/eval_svm.sh
Results
Arch | Pretrained dataset | Epoch | Pretrained model | Acc. on K400 |
---|---|---|---|---|
ResNet50 | Kinetics400 | 400 | Download link | 64.1 |
bash shell/eval_retrieval.sh
Results
Arch | Pretrained dataset | Epoch | Pretrained model | R@1 on UCF101 | R@1 on HMDB51 |
---|---|---|---|---|---|
ResNet50 | Kinetics400 | 400 | Download link | 70.6 | 35.2 |
ResNet50 | UCF101 | 400 | Download link | 46.8 | 17.6 |
Here, we use mmaction2 for both tasks. If you are not familiar with mmaction2, you can read the official documentation.
Step1: Install mmaction2
To make sure the results can be reproduced, please use our forked version of mmaction2 (version: 0.11.0):
conda activate vclr
cd ~
git clone https://github.com/KuangHaofei/mmaction2
cd mmaction2
pip install -v -e .
Step2: Prepare the pretrained weights
Our pretrained backbone have different format with the backbone of mmaction2, it should be transferred to mmaction2 format. We provide the transferred version of our K400 pretrained weights, TSN and TSM. We also provide the script for transferring weights, you can find it here.
Moving the pretrained weights to checkpoints
directory:
cd ~/mmaction2
mkdir checkpoints
wget https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm.pth
wget https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm_tsm.pth
Make sure you have prepared the dataset and environments following the previous step. Now suppose you are in the root directory of mmaction2
, follow the subsequent steps to fine tune the TSN or TSM models for action recognition.
For each dataset, the train and test setting can be found in the configuration files.
UCF101
./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_ucf101.py 8 \
--validate --seed 0 --deterministic
python tools/test.py configs/recognition/tsn/vclr/tsn_ucf101.py \
work_dirs/vclr/ucf101/latest.pth \
--eval top_k_accuracy mean_class_accuracy --out result.json
HMDB51
./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_hmdb51.py 8 \
--validate --seed 0 --deterministic
python tools/test.py configs/recognition/tsn/vclr/tsn_hmdb51.py \
work_dirs/vclr/hmdb51/latest.pth \
--eval top_k_accuracy mean_class_accuracy --out result.json
SomethingSomethingV2: TSN
./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_sthv2.py 8 \
--validate --seed 0 --deterministic
python tools/test.py configs/recognition/tsn/vclr/tsn_sthv2.py \
work_dirs/vclr/tsn_sthv2/latest.pth \
--eval top_k_accuracy mean_class_accuracy --out result.json
SomethingSomethingV2: TSM
./tools/dist_train.sh configs/recognition/tsm/vclr/tsm_sthv2.py 8 \
--validate --seed 0 --deterministic
python tools/test.py configs/recognition/tsm/vclr/tsm_sthv2.py \
work_dirs/vclr/tsm_sthv2/latest.pth \
--eval top_k_accuracy mean_class_accuracy --out result.json
ActivityNet
./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_activitynet.py 8 \
--validate --seed 0 --deterministic
python tools/test.py configs/recognition/tsn/vclr/tsn_activitynet.py \
work_dirs/vclr/tsn_activitynet/latest.pth \
--eval top_k_accuracy mean_class_accuracy --out result.json
Results
Arch | Dataset | Finetuned model | Acc. |
---|---|---|---|
TSN | UCF101 | Download link | 85.6 |
TSN | HMDB51 | Download link | 54.1 |
TSN | SomethingSomethingV2 | Download link | 33.3 |
TSM | SomethingSomethingV2 | Download link | 52.0 |
TSN | ActivityNet | Download link | 71.9 |
Step 1: Follow the previous section, suppose the finetuned model is saved at work_dirs/vclr/tsn_activitynet/latest.pth
Step 2: Extract ActivityNet features
cd ~/mmaction2/tools/data/activitynet/
python tsn_feature_extraction.py --data-prefix /home/ubuntu/data/ActivityNet/rawframes \
--data-list /home/ubuntu/data/ActivityNet/anet_train_video.txt \
--output-prefix /home/ubuntu/data/ActivityNet/rgb_feat \
--modality RGB --ckpt /home/ubuntu/mmaction2/work_dirs/vclr/tsn_activitynet/latest.pth
python tsn_feature_extraction.py --data-prefix /home/ubuntu/data/ActivityNet/rawframes \
--data-list /home/ubuntu/data/ActivityNet/anet_val_video.txt \
--output-prefix /home/ubuntu/data/ActivityNet/rgb_feat \
--modality RGB --ckpt /home/ubuntu/mmaction2/work_dirs/vclr/tsn_activitynet/latest.pth
python activitynet_feature_postprocessing.py \
--rgb /home/ubuntu/data/ActivityNet/rgb_feat \
--dest /home/ubuntu/data/ActivityNet/mmaction_feat
Note, the root directory of ActivityNey is /home/ubuntu/data/ActivityNet/
in our case. Please replace it according to your real directory.
Step 3: Train and test the BMN model
cd ~/mmaction2
./tools/dist_train.sh configs/localization/bmn/bmn_acitivitynet_feature_vclr.py 2 \
--work-dir work_dirs/vclr/bmn_activitynet --validate --seed 0 --deterministic --bmn
python tools/test.py configs/localization/bmn/bmn_acitivitynet_feature_vclr.py \
work_dirs/vclr/bmn_activitynet/latest.pth \
--bmn --eval AR@AN --out result.json
Results
Arch | Dataset | Finetuned model | AUC | AR@100 |
---|---|---|---|---|
BMN | ActivityNet | Download link | 65.5 | 73.8 |
We provide our feature visualization code at here.
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.