ColumbiaDVMM / CDC

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
68 stars 18 forks source link

Please find the whole code repo and models at CDC-bitbucket

Project website: http://www.ee.columbia.edu/ln/dvmm/researchProjects/cdc/

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

By Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, and Shih-Fu Chang.

Introduction:

CDC is a deep learning framework for per-frame labeling and precise temporal action localization in untrimmed long videos.

This code has been tested on Ubuntu 14.04 with a single NVIDIA GeForce GTX TITAN X card.

Please use "Issues" on github instead of bitbucket to ask questions or report bugs. Thanks.

[comment]: # () Current code is our rough version and we are improving its implementation details, while the current version suffices to run demo, repeat our experimental results, and train your own models.

License

CDC is released under the MIT License (refer to the LICENSE file for details).

Citing:

If you find CDC useful, please consider citing:

@inproceedings{cdc_shou_cvpr17,
  author = {Zheng Shou and Jonathan Chan and Alireza Zareian and Kazuyuki Miyazawa and Shih-Fu Chang},
  title = {CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos},
  year = {2017},
  booktitle = {CVPR} 
  }

We build this repo based on C3D and THUMOS Challenge 2014 . Please cite the following papers as well:

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, ICCV 2015.

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, Caffe: Convolutional Architecture for Fast Feature Embedding, arXiv 2014.

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014.

@misc{THUMOS14,
  author = "Jiang, Y.-G. and Liu, J. and Roshan Zamir, A. and Toderici, G. and Laptev, I. and Shah, M. and Sukthankar, R.",
  title = "{THUMOS} Challenge: Action Recognition with a Large Number of Classes",
  howpublished = "\url{http://crcv.ucf.edu/THUMOS14/}",
  Year = {2014}
  }

@inproceedings{scnn_shou_wang_chang_cvpr16,
  author = {Zheng Shou and Dongang Wang and Shih-Fu Chang},
  title = {Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs},
  year = {2016},
  booktitle = {CVPR} 
  }

Installation:

Run demo:

Reproduce results on THUMOS 2014 dataset:

  1. Temporal localization:
    • cd THUMOS14/eval/TemporalActionLocalization/ and run matlab eval_thumos14.m
    • results are stored in res_CDC_thumos14.mat. we vary the overlap threshold IoU used in evaluation from 0.3 to 0.7

Train your own model: