Closed xiaohu2015 closed 2 years ago
Dear @xiaohu2015
you can use the same method as used in the XCiT paper (https://arxiv.org/abs/2106.09681) , where we up-/down-scaled intermediate feature maps so as to fit Mask RCNN resolution.
The following papers have actually adopted this technique:
OK, thanks
In the paper
Augmenting Convolutional networks with attention-based aggregation
, a simple PatchConvNet is presented. But PatchConvNet only output a feature map with 1/16 of original image size, the Mask RCNN model needs multi-level features, egp2
,p3
,p4
,p5
, so how PatchConvNet can adapt to Mask RCNN? Do we need downsample or upsample the output of PatchConvNet to get multi-level features? @jegou @TouvronHugo @Celebio