Closed salehiac closed 5 years ago
No need to apologize, there is no trivial question.
These scales stand for the reduction scales caused by the backbone's strides. BTW, you should understand well the ResNet and ResNeXt architectures to better understand this explanation.
For instance, suppose you found a RoI of coordinates [0, 0, 64, 64]
in the input image. Suppose again that you want to pool its features from all backbone's levels (here, a backbone is a ResNet or ResNeXt architecture).
So, since there is a stride of 2 in the conv1
layer and another stride of 2 at the end of the first block, it results in a feature-map 4x smaller than the original image, thus, a scale of 0.25. Since, there is a stride of 2 between all the convolution blocks of the backbone, the scale gets divided by 2 at each level.
Hence, the coordinates of your RoI will be:
[0, 0, 16, 16]
in the first level feature-map [0, 0, 8, 8]
in the second level feature-map [0, 0, 4, 4]
in the third level feature-map [0, 0, 2, 2]
in the fourth level feature-map The sampling_ratio
parameter determines how many samples you want to do in the bi-linear interpolation of the RoIAlign algorithm.
Thank you very much @LeviViana . That is a very good explanation!
Thanks for your great explanation @LeviViana !
thanks
❓ Questions and Help
Hi, I'm not a deep learning expert so I apologize if that is a trivial question.
I'm a bit confused by the
_C.MODEL.ROI_BOX_HEAD.POOLER_SCALES = (0.25, 0.125, 0.0625, 0.03125)
_C.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO = 2
parameters in the yacs config files. I looked at the
Pooler
and relevantroiAlign
cuda code, but I'm still not sure how these values are computed and what they mean. Can somebody please explain them? Thanks.