===============================================================================
This repository contains the code for our ICCV 2017 paper:
Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman
"Detect to Track and Track to Detect"
in Proc. ICCV 2017
This repository also contains results for a ResNeXt-101 and Inception-v4 backbone network that perform slightly better (81.6% and 82.1% mAP on ImageNet VID val) than the ResNet-101 backbone (80.0% mAP) used in the conference version of the paper
This code builds on the original Matlab version of R-FCN
We are preparing a Python version of D&T that will support end-to-end training and inference of the RPN, Detector & Tracker.
If you find the code useful for your research, please cite our paper:
@inproceedings{feichtenhofer2017detect,
title={Detect to Track and Track to Detect},
author={Feichtenhofer, Christoph and Pinz, Axel and Zisserman, Andrew},
booktitle={International Conference on Computer Vision (ICCV)},
year={2017}
}
The code was tested on Ubuntu 14.04, 16.04 and Windows 10 using NVIDIA Titan X or Z GPUs.
If you have questions regarding the implementation please contact:
Christoph Feichtenhofer <feichtenhofer AT tugraz.at>
================================================================================
Download the code git clone --recursive https://github.com/feichtenhofer/detect-track
Compile the code by running rfcn_build.m
.
Edit the file get_root_path.m to adjust the models and data paths.
check_dl_model
will attempt to download the model to the respective directoriesdownload_proposals
will attempt to download & extract the proposal files to the respective directories
script_Detect_ILSVRC_vid_ResNet_OHEM_rpn();
to train the image-based Detection network.script_DetectTrack_ILSVRC_vid_ResNet_OHEM_rpn();
to train the video-based Detection & Tacking network.rfcn_test();
to test the image-based Detection network.rfcn_test_vid();
to test the video-based Detection & Tacking network with 2 frames at a time.rfcn_test_vid_multiframe();
to test the video-based Detection & Tacking network with 3 frames at a time.test_track.prototxt
is the most simple form of D&T testingtest_track_reg.prototxt
is a D&T version that additionally regresses the tracking boxes before performing the ROI tracking. Therefore, this procedure produces tracks that tightly encompass the underlying objects, whereas the above function tracks the proposal region (and therefore also the background area). test_track_regcls.prototxt
is a D&T version that additionally classifies the tracked region and computes the detection confidence as the mean of the detection score from the current frame, as well as the detection score of the tracked region in the next frame. Therefore, this method produces better results, especially if the temporal distance between the frames becomes larger and more complementary information can be integrated from the tracked regionMethod | test structure | ResNet-50 | ResNet-101 | ResNeXt-101 | Inception-v4 |
---|---|---|---|---|---|
Detect | test.prototxt | 72.1 | 74.1 | 75.9 | 77.9 |
Detect & Track | test_track.prototxt | 76.5 | 79.8 | 81.4 | 82.0 |
Detect & Track | test_track_regcls.prototxt | 76.7 | 80.0 | 81.6 | 82.1 |
Our models were trained using region proposals extracted using a Region Proposal Network that is trained on the same data as D&T. We use the RPN from craftGBD and provide the extracted proposals for training and testing on ImageNet VID and the DET subsets below.
Pre-computed object proposals for