fanyix / STMN

[ECCV 2018] Video Object Detection with an Aligned Spatial-Temporal Memory
http://fanyix.cs.ucdavis.edu/project/stmn/project.html
Other
118 stars 16 forks source link

This code repository contains an implementation of our ECCV video detection work. If you use this code, please cite:

Video Object Detection with an Aligned Spatial-Temporal Memory, Fanyi Xiao and Yong Jae Lee in ECCV 2018. [Bibtex]

Getting Started

Installation

The following installation procedure is tested under:

Ubuntu 16.04
CUDA 9.0
Torch 7

After these steps, you should be expecting a code/data structure like the following:

$ROOT
  - STMN
  - dataset
    - ImageNetVID
      - Data
        - VID
          - train
          - val
          - test
      - exp
        - anno
          - train.t7
          - val.t7
          - test.t7
        - proposals
          - train
          - val
          - test
      - models
        - stmn.t7
        - rfcn.t7
        - resnet-101.t7
    - ImageNetDET
      - Annotations
      - ImageSets
      - Data
        - DET
          - train
          - val
      - exp
        - annotations
        - proposals
cd $ROOT/STMN/modules/rfcn
luarocks make rfcn-0.1-0.rockspec

cd $ROOT/STMN/modules/assemble
luarocks make assemble-0.1-0.rockspec

cd $ROOT/STMN/modules/stnbhwd
luarocks make stnbhwd-scm-1.rockspec
cd $ROOT/STMN/external/coco
luarocks make LuaAPI/rocks/coco-scm-1.rockspec

Training models

Evaluating models

Please note that above commands are examples following which you can reproduce our results, however it will be slow due to the sheer amount of frames you need to evaluate. Instead in our own experiments we always parallelize the above procedure with the help of the launch script provided in scripts/launcher.py. We highly encourage you to take a look at this script and parallelize this procedure like we do.

Okay, after you're done with both commands shown above, you should have produced the raw detection results (without NMS) which we will then send to the temporal linkage procedure to generate our final detections. For this, we base our code on the brilliant code of D&T (however it does require a MATLAB license to use this code) and make some modifications to only use its dynamic programming functionality.

[Optional] Again, you will reproduce our results (80.5% mAP) with the above command, however it might be slow to go over the evaluation set. To assist you in this process, we also provide some parallelization utilities in run_dp.m and a launch script launcher.py under $ROOT/STMN/external/dp (note this is a different launch script than the one we used above under $ROOT/STMN/scripts/). Specifically, you first set the opts.scan_det in run_dp.m to true and launch it with $ROOT/STMN/external/dp/launcher.py. Then you set opts.scan_det in run_dp.m to false and opts.load_scan_det to true and run the script again in a MATLAB console.

Acknowledgement

We develop this codebase from the great code of multipathnet.