Open owen24819 opened 1 year ago
Yes. We use a 1/8 map in the encoder by default. The biggest map we use is 1/4 of the encoder (refer to our 5-scale model). It can improve the performance by around 0.5 AP.
Hi, thanks for the response. I was specifically wondering if you fed multiple encoder feature maps to the segmentation head. e.g. fed the 1/4, 1/8 and 1/16 encoder maps to the segmentation head. In the code I highlighted above, it seems like it was written as if you maybe tried this.
Hi,
Nice work. I see that you use the highest resolution backbone feature map and encoder feature map to generate the pixel embedding map. Did you try including other feature maps with lower resolution (backbone or encoder) and find any increase in performance?
Thanks, Owen
https://github.com/IDEA-Research/MaskDINO/blob/76c8e4536ad8f01ed97f71fe47dd05518b5dbdaf/maskdino/modeling/pixel_decoder/maskdino_encoder.py#L415-L428