facebookresearch / maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
MIT License
9.3k stars 2.49k forks source link

Confused about some Pooler parameters #544

Closed salehiac closed 5 years ago

salehiac commented 5 years ago

❓ Questions and Help

Hi, I'm not a deep learning expert so I apologize if that is a trivial question.

I'm a bit confused by the

_C.MODEL.ROI_BOX_HEAD.POOLER_SCALES = (0.25, 0.125, 0.0625, 0.03125) _C.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO = 2

parameters in the yacs config files. I looked at the Pooler and relevant roiAlign cuda code, but I'm still not sure how these values are computed and what they mean. Can somebody please explain them? Thanks.

LeviViana commented 5 years ago

No need to apologize, there is no trivial question.

These scales stand for the reduction scales caused by the backbone's strides. BTW, you should understand well the ResNet and ResNeXt architectures to better understand this explanation.

For instance, suppose you found a RoI of coordinates [0, 0, 64, 64] in the input image. Suppose again that you want to pool its features from all backbone's levels (here, a backbone is a ResNet or ResNeXt architecture).

So, since there is a stride of 2 in the conv1 layer and another stride of 2 at the end of the first block, it results in a feature-map 4x smaller than the original image, thus, a scale of 0.25. Since, there is a stride of 2 between all the convolution blocks of the backbone, the scale gets divided by 2 at each level.

Hence, the coordinates of your RoI will be:

The sampling_ratio parameter determines how many samples you want to do in the bi-linear interpolation of the RoIAlign algorithm.

salehiac commented 5 years ago

Thank you very much @LeviViana . That is a very good explanation!

fmassa commented 5 years ago

Thanks for your great explanation @LeviViana !

abcxs commented 5 years ago

thanks