This repo contains code for extracting teacher representations, training the proposed method, and probing the resulting representation from our SCALE paper (ICCV 2023).
Spatio-Temporal Crop Aggregation for Video Representation Learning
Sepehr Sameni, Simon Jenni, Paolo Favaro
University of Bern, Adobe Research, University of Bern
ICCV 2023
In order to load the videos fast we use video_reader which requires manual build of TorchVision. We used cuda=11.6, and python=3.8, please follow this guide to install the required packages.
Below are some example commands to run each model.
python extract_features.py --model train --dataset ucf # for all the pretrained teachers
python train.py --exp_name BYOL-UCF --epochs 1000 --initialization_model byl --dataset ucf
python eval.py --exp_name BYOL-UCF --dataset ucf --load -1 --freeze
If you find our code useful for your research, please cite our paper.
@article{Sameni2022SpatioTemporalCA,
title={Spatio-Temporal Crop Aggregation for Video Representation Learning},
author={Sepehr Sameni and S. Jenni and Paolo Favaro},
journal={ArXiv},
year={2022},
volume={abs/2211.17042},
url={https://api.semanticscholar.org/CorpusID:254096149}
}