facebookresearch / deit

Official DeiT repository
Apache License 2.0
4.07k stars 556 forks source link

A question about MaskRCNN with PatchConvNet? #138

Closed xiaohu2015 closed 2 years ago

xiaohu2015 commented 2 years ago

In the paper Augmenting Convolutional networks with attention-based aggregation, a simple PatchConvNet is presented. But PatchConvNet only output a feature map with 1/16 of original image size, the Mask RCNN model needs multi-level features, eg p2, p3, p4, p5, so how PatchConvNet can adapt to Mask RCNN? Do we need downsample or upsample the output of PatchConvNet to get multi-level features? @jegou @TouvronHugo @Celebio

jegou commented 2 years ago

Dear @xiaohu2015

you can use the same method as used in the XCiT paper (https://arxiv.org/abs/2106.09681) , where we up-/down-scaled intermediate feature maps so as to fit Mask RCNN resolution.

The following papers have actually adopted this technique:

xiaohu2015 commented 2 years ago

OK, thanks