TengdaHan / DPC

Video Representation Learning by Dense Predictive Coding. Tengda Han, Weidi Xie, Andrew Zisserman.
MIT License
251 stars 34 forks source link

Video Representation Learning by Dense Predictive Coding

This repository contains the implementation of Dense Predictive Coding (DPC).

Links: [Arxiv] [Video] [Project page]

arch

DPC Results

Original result from our paper:

Pretrain Dataset Resolution Backbone Finetune Acc@1 (UCF101) Finetune Acc@1 (HMDB51)
UCF101 128x128 2d3d-R18 60.6 -
Kinetics400 128x128 2d3d-R18 68.2 34.5
Kinetics400 224x224 2d3d-R34 75.7 35.7
Also re-implemented by other researchers: Pretrain Dataset Resolution Backbone Finetune Acc@1 (UCF101) Finetune Acc@1 (HMDB51)
UCF101 128x128 2d3d-R18 61.35 @kayush95 45.31 @kayush95

News

Installation

The implementation should work with python >= 3.6, pytorch >= 0.4, torchvision >= 0.2.2.

The repo also requires cv2 (conda install -c menpo opencv), tensorboardX >= 1.7 (pip install tensorboardX), joblib, tqdm, ipdb.

Prepare data

Follow the instructions here.

Self-supervised training (DPC)

Change directory cd DPC/dpc/

Evaluation: supervised action classification

Change directory cd DPC/eval/

DPC-pretrained weights

It took us more than 1 week to train the 3D-ResNet18 DPC model on Kinetics-400 with 128x128 resolution, and it tooks about 6 weeks to train the 3D-ResNet34 DPC model on Kinetics-400 with 224x224 resolution (with 4 Nvidia P40 GPUs).

Download link:

Citation

If you find the repo useful for your research, please consider citing our paper:

@InProceedings{Han19dpc,
  author       = "Tengda Han and Weidi Xie and Andrew Zisserman",
  title        = "Video Representation Learning by Dense Predictive Coding",
  booktitle    = "Workshop on Large Scale Holistic Video Understanding, ICCV",
  year         = "2019",
}

For any questions, welcome to create an issue or contact Tengda Han (htd@robots.ox.ac.uk).