facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.27k stars 5.46k forks source link

FPN rpn inference time is faster than faster RCNN? #336

Closed Tangshitao closed 6 years ago

Tangshitao commented 6 years ago

screenshot from 2018-04-03 14 39 16

As shown in the table, for RPN, FPN inference time is faster than faster RCNN. From my point of view, FPN adds more convolution operations, so I think it is strange. Can anyone explain this?

rbgirshick commented 6 years ago

The first step is to understand where execution time is spent. To do that, use MODEL.EXECUTION_TYPE prof_dag:

python2 tools/test_net.py --cfg configs/12_2017_baselines/rpn_R-50-C4_1x.yaml TEST.WEIGHTS https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/train/coco_2014_train:coco_2014_valminusminival/rpn/model_final.pkl TEST.DATASETS "('coco_2014_minival',)" MODEL.EXECUTION_TYPE prof_dag

Output after exiting:

I0403 07:36:22.714701 547318 prof_dag_net.cc:188] Measured operators over 84 net runs.
I0403 07:36:22.714779 547318 prof_dag_net.cc:205] Mean time in operator per run (stddev):
I0403 07:36:22.714784 547318 prof_dag_net.cc:209] 12.9896 ms/run ( 3.27039 ms/run) Op count per run: 43 AffineChannel
I0403 07:36:22.714797 547318 prof_dag_net.cc:209] 40.543 ms/run ( 9.98674 ms/run) Op count per run: 46 Conv
I0403 07:36:22.714802 547318 prof_dag_net.cc:209] 0.467649 ms/run ( 0.175602 ms/run) Op count per run: 1 MaxPool
I0403 07:36:22.714807 547318 prof_dag_net.cc:209] 238.597 ms/run ( 163.352 ms/run) Op count per run: 1 Python
I0403 07:36:22.714812 547318 prof_dag_net.cc:209] 8.17472 ms/run ( 1.04102 ms/run) Op count per run: 41 Relu
I0403 07:36:22.714818 547318 prof_dag_net.cc:209] 0.0666656 ms/run ( 0.143383 ms/run) Op count per run: 1 Sigmoid
I0403 07:36:22.714823 547318 prof_dag_net.cc:209] 0.00641894 ms/run (0.00590622 ms/run) Op count per run: 1 StopGradient
I0403 07:36:22.714828 547318 prof_dag_net.cc:209] 5.41326 ms/run ( 0.798915 ms/run) Op count per run: 13 Sum

If you compare this profile between the C4 and FPN models, you'll see that in fact the FPN model does spend more time executing the Conv op but the C4 model spends significantly more time executing a Python op. In this case, based on my background knowledge I can hypothesize that the difference in perf is due to nms in the Python op implemented in lib.ops.GenerateProposalsOp. The issue is that the nms function has O(n^2) runtime where n is the number of proposals. The FPN version runs nms separately for each pyramid level with a relatively small number of proposals per level (at most 2000 by default). The C4 version runs nms on a relatively large number of proposals that are from the same level (12000 by default). The quadratic runtime behavior of nms makes a big difference here.

If you set TEST.RPN_PRE_NMS_TOP_N to a smaller value, such as 10000 or even 5000, then you'll see faster runtimes (but possibly with lower proposal AR). E.g.:

python2 tools/test_net.py --cfg configs/12_2017_baselines/rpn_R-50-C4_1x.yaml TEST.WEIGHTS https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/train/coco_2014_train:coco_2014_valminusminival/rpn/model_final.pkl TEST.DATASETS "('coco_2014_minival',)" MODEL.EXECUTION_TYPE prof_dag TEST.RPN_PRE_NMS_TOP_N 10000