Separius / SCALE

official pytorch implementation of "Spatio-Temporal Crop Aggregation for Video Representation Learning" (ICCV23)
4 stars 0 forks source link

SCALE: Spatio-Temporal Crop Aggregation for Video Representation Learning

Paper


grid

This repo contains code for extracting teacher representations, training the proposed method, and probing the resulting representation from our SCALE paper (ICCV 2023).

Spatio-Temporal Crop Aggregation for Video Representation Learning
Sepehr Sameni, Simon Jenni, Paolo Favaro
University of Bern, Adobe Research, University of Bern
ICCV 2023

Getting Started

In order to load the videos fast we use video_reader which requires manual build of TorchVision. We used cuda=11.6, and python=3.8, please follow this guide to install the required packages.

Usage

Below are some example commands to run each model.

Extracting Pretrained Features

python extract_features.py --model train --dataset ucf # for all the pretrained teachers

Training SCALE

python train.py --exp_name BYOL-UCF --epochs 1000 --initialization_model byl --dataset ucf

Probing SCALE

python eval.py --exp_name BYOL-UCF --dataset ucf --load -1 --freeze 

Reference

If you find our code useful for your research, please cite our paper.

@article{Sameni2022SpatioTemporalCA,
  title={Spatio-Temporal Crop Aggregation for Video Representation Learning},
  author={Sepehr Sameni and S. Jenni and Paolo Favaro},
  journal={ArXiv},
  year={2022},
  volume={abs/2211.17042},
  url={https://api.semanticscholar.org/CorpusID:254096149}
}