This repository contains the PyTorch implementation of the paper Self-Supervised Video Similarity Learning. It contains code for the training of video similarity learning network with self-supervision. Also, to facilitate the reproduction of the paper's results, the evaluation code, the extracted features for the employed video datasets, and pre-trained models are provided.
Clone this repo
$ git clone git@github.com:https://github.com/gkordo/s2vs.git
$ cd s2vs
Install the required packages
$ pip install -r requirements.txt
Extract the frames from the videos in the dataset used for training.
$ ffmpeg -nostdin -y -vf fps=1 -start_number 0 -q 0 ${video_id}/%05d.jpg -i <path_to_video>
Edit scripts/train_ssl.sh
to configure the training session.
Choose the augmentation types you want to include during training by providing the appropriate values to the
--augmentations
argument. Provide a string that contains GT
for Global Transformations, FT
for Frame Transformations
TT
for Temporal Transformations and ViV
for Video-in-Video.
Run the script as follows
$ bash scripts/train_ssl.sh
Once the training is over, a model.pth
file will have been created in a path based on the provided experiment_path
argument.
Download the datasets from the original sources:
Determine the pattern based on the video ids that video files are stored, e.g. {id}/video.*
if it follows the pattern:
Dataset_dir
├── video_id1
│ └── video.mp4
├── video_id2
│ └── video.flv
│ ⋮
└── video_idN
└── video.webm
Run the evaluation.py
script to evaluate a trained model.
$ python evaluation.py --dataset FIVR-200K --dataset_path <path_to_dataset> --pattern '{id}/video.*' --model_path <path_to_model>
or run the script with the provided features
$ python evaluation.py --dataset FIVR-200K --dataset_hdf5 <path_to_hdf5> --model_path <path_to_model>
If no value is given to the --model_path
argument, then the pretrained s2vs_dns
model is used.
feat_extractor = torch.hub.load('gkordo/s2vs:main', 'resnet50_LiMAC') s2vs_dns = torch.hub.load('gkordo/s2vs:main', 's2vs_dns') s2vs_vcdb = torch.hub.load('gkordo/s2vs:main', 's2vs_vcdb')
## Citation
If you use this code for your research, please consider citing our papers:
```bibtex
@inproceedings{kordopatis2023s2vs,
title={Self-Supervised Video Similarity Learning},
author={Kordopatis-Zilos, Giorgos and Tolias, Giorgos and Tzelepis, Christos and Kompatsiaris, Ioannis and Patras, Ioannis and Papadopoulos, Symeon},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year={2023}
}
@inproceedings{kordopatis2019visil,
title={{ViSiL}: Fine-grained Spatio-Temporal Video Similarity Learning},
author={Kordopatis-Zilos, Giorgos and Papadopoulos, Symeon and Patras, Ioannis and Kompatsiaris, Ioannis},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2019}
}
For visualization examples of augmentation and similarity matrices, as well as model usage in code, have a look at this Colab notebook.
DnS - computational efficiency w/ selector network
ViSiL - original ViSiL approach
FIVR-200K - download our FIVR-200K dataset
This project is licensed under the MIT License - see the LICENSE file for details
Giorgos Kordopatis-Zilos (kordogeo@fel.cvut.cz)