facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.24k stars 5.45k forks source link

Faster RCNN (R-50-C4) configuration default setting questions #118

Closed bowenc0221 closed 6 years ago

bowenc0221 commented 6 years ago

Thanks for the great codes. I found some very trivial problems in the configuration file for R-50-C4.

After reading the config files, I found R-50-C4 was trained using most of the Fast RCNN default settings (https://github.com/facebookresearch/Detectron/blob/master/configs/12_2017_baselines/e2e_faster_rcnn_R-50-C4_1x.yaml#L17).

However, the default configuration sets FAST_RCNN.ROI_XFORM_SAMPLING_RATIO to 0 (https://github.com/facebookresearch/Detectron/blob/master/lib/core/config.py#L648). I guess it might be a mistake?

The spatial resolution after ROIAlign in Mask RCNN paper is set to 7x7, but the R-50-C4 config file sets it to 14x14, is this another mistake?

KaimingHe commented 6 years ago

ROI_XFORM_SAMPLING_RATIO is the number of points sampled inside each bin, and by default it is 2 (so 2x2 points sampled per bin). ROI_XFORM_SAMPLING_RATIO=0 means an adaptive ratio is used; see: https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_op.cu#L110

Regarding R-50-C4, it uses the original MSRA ResNet-50 model (and weights), where the stride=2 op in a residual block was put in the first 1x1 layer (instead of the 3x3 layer in most following work). So after producing the 14x14 output, it will be soon subsampled to 7x7 by the following residual block. It is similar to setting a 7x7 RoIAlign and setting the stride of the following residual block as 1.

lucasjinreal commented 5 years ago

What's the meaning of C4 here?