endernewton / tf-faster-rcnn

Tensorflow Faster RCNN for Object Detection
https://arxiv.org/pdf/1702.02138.pdf
MIT License
3.65k stars 1.58k forks source link
coco faster-rcnn mobilenet object-detection resnet tensorboard tensorflow voc

tf-faster-rcnn is deprecated:

For a good and more up-to-date implementation for faster/mask RCNN with multi-gpu support, please see the example in TensorPack here.

tf-faster-rcnn

A Tensorflow implementation of faster RCNN detection framework by Xinlei Chen (xinleic@cs.cmu.edu). This repository is based on the python Caffe implementation of faster RCNN available here.

Note: Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report An Implementation of Faster RCNN with Study for Region Sampling. If you are seeking to reproduce the results in the original paper, please use the official code or maybe the semi-official code. For details about the faster RCNN architecture please refer to the paper Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Detection Performance

The current code supports VGG16, Resnet V1 and Mobilenet V1 models. We mainly tested it on plain VGG16 and Resnet101 (thank you @philokey!) architecture. As the baseline, we report numbers using a single model on a single convolution layer, so no multi-scale, no multi-stage bounding box regression, no skip-connection, no extra input is used. The only data augmentation technique is left-right flipping during training following the original Faster RCNN. All models are released.

With VGG16 (conv5_3):

With Resnet101 (last conv4):

More Results:

Approximate baseline setup from FPN (this repository does not contain training code for FPN yet):

Note:

Displayed Ground Truth on Tensorboard Displayed Predictions on Tensorboard

Additional features

Additional features not mentioned in the report are added to make research life easier:

Prerequisites

Installation

  1. Clone the repository

    git clone https://github.com/endernewton/tf-faster-rcnn.git
  2. Update your -arch in setup script to match your GPU

    cd tf-faster-rcnn/lib
    # Change the GPU architecture (-arch) if necessary
    vim setup.py
    GPU model Architecture
    TitanX (Maxwell/Pascal) sm_52
    GTX 960M sm_50
    GTX 1080 (Ti) sm_61
    Grid K520 (AWS g2.2xlarge) sm_30
    Tesla K80 (AWS p2.xlarge) sm_37

    Note: You are welcome to contribute the settings on your end if you have made the code work properly on other GPUs. Also even if you are only using CPU tensorflow, GPU based code (for NMS) will be used by default, so please set USE_GPU_NMS False to get the correct output.

  3. Build the Cython modules

    make clean
    make
    cd ..
  4. Install the Python COCO API. The code requires the API to access COCO dataset.

    cd data
    git clone https://github.com/pdollar/coco.git
    cd coco/PythonAPI
    make
    cd ../../..

Setup data

Please follow the instructions of py-faster-rcnn here to setup VOC and COCO datasets (Part of COCO is done). The steps involve downloading data and optionally creating soft links in the data folder. Since faster RCNN does not rely on pre-computed proposals, it is safe to ignore the steps that setup proposals.

If you find it useful, the data/cache folder created on my side is also shared here.

Demo and Test with pre-trained models

  1. Download pre-trained model

    # Resnet101 for voc pre-trained on 07+12 set
    ./data/scripts/fetch_faster_rcnn_models.sh

    Note: if you cannot download the models through the link, or you want to try more models, you can check out the following solutions and optionally update the downloading script:

    • Another server here.
    • Google drive here.
  2. Create a folder and a soft link to use the pre-trained model

    NET=res101
    TRAIN_IMDB=voc_2007_trainval+voc_2012_trainval
    mkdir -p output/${NET}/${TRAIN_IMDB}
    cd output/${NET}/${TRAIN_IMDB}
    ln -s ../../../data/voc_2007_trainval+voc_2012_trainval ./default
    cd ../../..
  3. Demo for testing on custom images

    # at repository root
    GPU_ID=0
    CUDA_VISIBLE_DEVICES=${GPU_ID} ./tools/demo.py

    Note: Resnet101 testing probably requires several gigabytes of memory, so if you encounter memory capacity issues, please install it with CPU support only. Refer to Issue 25.

  4. Test with pre-trained Resnet101 models

    GPU_ID=0
    ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101

    Note: If you cannot get the reported numbers (79.8 on my side), then probably the NMS function is compiled improperly, refer to Issue 5.

Train your own model

  1. Download pre-trained models and weights. The current code support VGG16 and Resnet V1 models. Pre-trained models are provided by slim, you can get the pre-trained models here and set them in the data/imagenet_weights folder. For example for VGG16 model, you can set up like:

    mkdir -p data/imagenet_weights
    cd data/imagenet_weights
    wget -v http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz
    tar -xzvf vgg_16_2016_08_28.tar.gz
    mv vgg_16.ckpt vgg16.ckpt
    cd ../..

    For Resnet101, you can set up like:

    mkdir -p data/imagenet_weights
    cd data/imagenet_weights
    wget -v http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz
    tar -xzvf resnet_v1_101_2016_08_28.tar.gz
    mv resnet_v1_101.ckpt res101.ckpt
    cd ../..
  2. Train (and test, evaluation)

    ./experiments/scripts/train_faster_rcnn.sh [GPU_ID] [DATASET] [NET]
    # GPU_ID is the GPU you want to test on
    # NET in {vgg16, res50, res101, res152} is the network arch to use
    # DATASET {pascal_voc, pascal_voc_0712, coco} is defined in train_faster_rcnn.sh
    # Examples:
    ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16
    ./experiments/scripts/train_faster_rcnn.sh 1 coco res101

    Note: Please double check you have deleted soft link to the pre-trained models before training. If you find NaNs during training, please refer to Issue 86. Also if you want to have multi-gpu support, check out Issue 121.

  3. Visualization with Tensorboard

    tensorboard --logdir=tensorboard/vgg16/voc_2007_trainval/ --port=7001 &
    tensorboard --logdir=tensorboard/vgg16/coco_2014_train+coco_2014_valminusminival/ --port=7002 &
  4. Test and evaluate

    ./experiments/scripts/test_faster_rcnn.sh [GPU_ID] [DATASET] [NET]
    # GPU_ID is the GPU you want to test on
    # NET in {vgg16, res50, res101, res152} is the network arch to use
    # DATASET {pascal_voc, pascal_voc_0712, coco} is defined in test_faster_rcnn.sh
    # Examples:
    ./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16
    ./experiments/scripts/test_faster_rcnn.sh 1 coco res101
  5. You can use tools/reval.sh for re-evaluation

By default, trained networks are saved under:

output/[NET]/[DATASET]/default/

Test outputs are saved under:

output/[NET]/[DATASET]/default/[SNAPSHOT]/

Tensorboard information for train and validation is saved under:

tensorboard/[NET]/[DATASET]/default/
tensorboard/[NET]/[DATASET]/default_val/

The default number of training iterations is kept the same to the original faster RCNN for VOC 2007, however I find it is beneficial to train longer (see report for COCO), probably due to the fact that the image batch size is one. For VOC 07+12 we switch to a 80k/110k schedule following R-FCN. Also note that due to the nondeterministic nature of the current implementation, the performance can vary a bit, but in general it should be within ~1% of the reported numbers for VOC, and ~0.2% of the reported numbers for COCO. Suggestions/Contributions are welcome.

Citation

If you find this implementation or the analysis conducted in our report helpful, please consider citing:

@article{chen17implementation,
    Author = {Xinlei Chen and Abhinav Gupta},
    Title = {An Implementation of Faster RCNN with Study for Region Sampling},
    Journal = {arXiv preprint arXiv:1702.02138},
    Year = {2017}
}

Or for a formal paper, Spatial Memory Network:

@article{chen2017spatial,
  title={Spatial Memory for Context Reasoning in Object Detection},
  author={Chen, Xinlei and Gupta, Abhinav},
  journal={arXiv preprint arXiv:1704.04224},
  year={2017}
}

For convenience, here is the faster RCNN citation:

@inproceedings{renNIPS15fasterrcnn,
    Author = {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun},
    Title = {Faster {R-CNN}: Towards Real-Time Object Detection
             with Region Proposal Networks},
    Booktitle = {Advances in Neural Information Processing Systems ({NIPS})},
    Year = {2015}
}