ShaoqingRen / faster_rcnn

Faster R-CNN
Other
2.71k stars 1.22k forks source link

This repo has been deprecated. Please see Detectron, which includes an implementation of Mask R-CNN.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

By Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun at Microsoft Research

Introduction

Faster R-CNN is an object detection framework based on deep convolutional networks, which includes a Region Proposal Network (RPN) and an Object Detection Network. Both networks are trained for sharing convolutional layers for fast testing.

Faster R-CNN was initially described in an arXiv tech report.

This repo contains a MATLAB re-implementation of Fast R-CNN. Details about Fast R-CNN are in: rbgirshick/fast-rcnn.

This code has been tested on Windows 7/8 64-bit, Windows Server 2012 R2, and Linux, and on MATLAB 2014a.

Python version is available at py-faster-rcnn.

License

Faster R-CNN is released under the MIT License (refer to the LICENSE file for details).

Citing Faster R-CNN

If you find Faster R-CNN useful in your research, please consider citing:

@article{ren15fasterrcnn,
    Author = {Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun},
    Title = {{Faster R-CNN}: Towards Real-Time Object Detection with Region Proposal Networks},
    Journal = {arXiv preprint arXiv:1506.01497},
    Year = {2015}
}

Main Results

training data test data mAP time/img
Faster RCNN, VGG-16 VOC 2007 trainval VOC 2007 test 69.9% 198ms
Faster RCNN, VGG-16 VOC 2007 trainval + 2012 trainval VOC 2007 test 73.2% 198ms
Faster RCNN, VGG-16 VOC 2012 trainval VOC 2012 test 67.0% 198ms
Faster RCNN, VGG-16 VOC 2007 trainval&test + 2012 trainval VOC 2012 test 70.4% 198ms

Note: The mAP results are subject to random variations. We have run 5 times independently for ZF net, and the mAPs are 59.9 (as in the paper), 60.4, 59.5, 60.1, and 59.5, with a mean of 59.88 and std 0.39.

Contents

  1. Requirements: software
  2. Requirements: hardware
  3. Preparation for Testing
  4. Testing Demo
  5. Preparation for Training
  6. Training
  7. Resources

Requirements: software

  1. Caffe build for Faster R-CNN (included in this repository, see external/caffe)
    • If you are using Windows, you may download a compiled mex file by running fetch_data/fetch_caffe_mex_windows_vs2013_cuda65.m
    • If you are using Linux or you want to compile for Windows, please follow the instructions on our Caffe branch.
  2. MATLAB

Requirements: hardware

GPU: Titan, Titan Black, Titan X, K20, K40, K80.

  1. Region Proposal Network (RPN)
    • 2GB GPU memory for ZF net
    • 5GB GPU memory for VGG-16 net
  2. Object Detection Network (Fast R-CNN)
    • 3GB GPU memory for ZF net
    • 8GB GPU memory for VGG-16 net

Preparation for Testing:

  1. Run fetch_data/fetch_caffe_mex_windows_vs2013_cuda65.m to download a compiled Caffe mex (for Windows only).
  2. Run faster_rcnn_build.m
  3. Run startup.m

Testing Demo:

  1. Run fetch_data/fetch_faster_rcnn_final_model.m to download our trained models.
  2. Run experiments/script_faster_rcnn_demo.m to test a single demo image.

    • You will see the timing information as below. We get the following running time on K40 @ 875 MHz and Intel Xeon CPU E5-2650 v2 @ 2.60GHz for the demo images with VGG-16:
      001763.jpg (500x375): time 0.201s (resize+conv+proposal: 0.150s, nms+regionwise: 0.052s)
      004545.jpg (500x375): time 0.201s (resize+conv+proposal: 0.151s, nms+regionwise: 0.050s)
      000542.jpg (500x375): time 0.192s (resize+conv+proposal: 0.151s, nms+regionwise: 0.041s)
      000456.jpg (500x375): time 0.202s (resize+conv+proposal: 0.152s, nms+regionwise: 0.050s)
      001150.jpg (500x375): time 0.194s (resize+conv+proposal: 0.151s, nms+regionwise: 0.043s)
      mean time: 0.198s

      and with ZF net:

      001763.jpg (500x375): time 0.061s (resize+conv+proposal: 0.032s, nms+regionwise: 0.029s)
      004545.jpg (500x375): time 0.063s (resize+conv+proposal: 0.034s, nms+regionwise: 0.029s)
      000542.jpg (500x375): time 0.052s (resize+conv+proposal: 0.034s, nms+regionwise: 0.018s)
      000456.jpg (500x375): time 0.062s (resize+conv+proposal: 0.034s, nms+regionwise: 0.028s)
      001150.jpg (500x375): time 0.058s (resize+conv+proposal: 0.034s, nms+regionwise: 0.023s)
      mean time: 0.059s
    • The visual results might be different from those in the paper due to numerical variations.
    • Running time on other GPUs
    GPU / mean time VGG-16 ZF
    K40 198ms 59ms
    Titan Black 174ms 56ms
    Titan X 151ms 59ms

Preparation for Training:

  1. Run fetch_data/fetch_model_ZF.m to download an ImageNet-pre-trained ZF net.
  2. Run fetch_data/fetch_model_VGG16.m to download an ImageNet-pre-trained VGG-16 net.
  3. Download VOC 2007 and 2012 data to ./datasets

Training:

  1. Run experiments/script_faster_rcnn_VOC2007_ZF.m to train a model with ZF net. It runs four steps as follows:
    • Train RPN with conv layers tuned; compute RPN results on the train/test sets.
    • Train Fast R-CNN with conv layers tuned using step-1 RPN proposals; evaluate detection mAP.
    • Train RPN with conv layers fixed; compute RPN results on the train/test sets.
    • Train Fast R-CNN with conv layers fixed using step-3 RPN proposals; evaluate detection mAP.
    • Note: the entire training time is ~12 hours on K40.
  2. Run experiments/script_faster_rcnn_VOC2007_VGG16.m to train a model with VGG net.
    • Note: the entire training time is ~2 days on K40.
  3. Check other scripts in ./experiments for more settings.

Resources

Note: This documentation may contain links to third party websites, which are provided for your convenience only. Such third party websites are not under Microsoft’s control. Microsoft does not endorse or make any representation, guarantee or assurance regarding any third party website, content, service or product. Third party websites may be subject to the third party’s terms, conditions, and privacy statements.

  1. Experiment logs: OneDrive, DropBox, BaiduYun
  2. Regions proposals of our trained RPN:

If the automatic "fetch_data" fails, you may manually download resouces from:

  1. Pre-complied caffe mex:
  2. ImageNet-pretrained networks:
  3. Final RPN+FastRCNN models: OneDrive, DropBox, BaiduYun