Linardos / SalEMA

Simple vs complex temporal recurrences for video saliency prediction (BMVC 2019)
https://imatge-upc.github.io/SalEMA/
25 stars 11 forks source link

SalEMA

SalEMA is a video saliency prediction network. It utilizes a moving average of convolutional states to produce state of the art results. The architecture has been trained on DHF1K.

Publication

Find the published version of our work here: arXiv, or check our friendly summary on medium. Also, if this project has been helpful to your work, please consider citing us:


@inproceedings{linardos2019simple,
  author    = {Panagiotis Linardos and
               Eva Mohedano and
               Juan Jos{\'{e}} Nieto and
               Noel E. O'Connor and
               Xavier Gir{\'{o}}{-}i{-}Nieto and
               Kevin McGuinness},
  title     = {Simple vs complex temporal recurrences for video saliency prediction},
  booktitle = {30th British Machine Vision Conference 2019, {BMVC} 2019, Cardiff,
               UK, September 9-12, 2019},
  pages     = {182},
  publisher = {{BMVA} Press},
  year      = {2019},
  url       = {https://bmvc2019.org/wp-content/uploads/papers/0952-paper.pdf},
  timestamp = {Thu, 30 Apr 2020 17:36:09 +0200},
  biburl    = {https://dblp.org/rec/conf/bmvc/LinardosMNONM19.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Model

TemporalEDmodel

Sample video (click to be redirected to youtube):

IMAGE ALT TEXT HERE

Installation

git clone https://github.com/Linardos/SalEMA

Inference

You may use our pretrained model for inference on either of the 3 datasets: DHF1K [link], Hollywood-2 [link], UCF-sports [link] or your own dataset so long as it follows a specific folder structure:

To perform inference on DHF1K validation set:

python inference.py -dataset=DHF1K -pt_model=SalEMA30.pt -alpha=0.1 -start=600 -end=700 -dst=/path/to/output -src=/path/to/DHF1K

To perform inference on Hollywood-2 or UCF-sports test set (because of the way the dataset is structured, it's convenient to use the same path for dst and src):

python inference.py -dataset=Hollywood-2 -pt_model=SalEMA30.pt -alpha=0.1 -dst=/path/to/Hollywood-2/testing -src=/path/to/Hollywood-2/testing
python inference.py -dataset=UCF-sports -pt_model=SalEMA30.pt -alpha=0.1 -dst=/path/to/UCF-sports/testing -src=/path/to/UCF-sports/testing

To perform inference on your own dataset make sure to follow a simple folder structure (one superfolder given as root at the input, which includes folders of frames) and use the tag "other":

python inference.py -dataset=other -alpha=0.1 -pt_model=SalEMA30.pt -dst=/path/to/output -src=/path/to/superfolder/frames

If your dataset follows a more quirky structure you might need to manipulate the data_loader source code.

Training

To train on DHF1K using CUDA:

python train.py -dataset=DHF1K -pt_model=False -new_model=SalEMA -ema_loc=30 -start=1 -end=4 -src=/path/to/DHF1K -use_gpu='gpu' -epochs=7

To train on Hollywood-2, UCF-sports using CUDA. For fine-tuning a pretrained model, use a higher number of epochs, the training commences from the epoch number where it stopped on:

python train.py -dataset=Hollywood-2 -pt_model=SalEMA30.pt -new_model=SalEMA -ema_loc=30 -src=/path/to/Hollywood-2 -use_gpu='gpu' -epochs=10 -lr=0.0000001