IDEA-Research / MaskDINO

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"
Apache License 2.0
1.21k stars 109 forks source link

Question about the segmentation branch #39

Open owen24819 opened 1 year ago

owen24819 commented 1 year ago

Hi,

Nice work. I see that you use the highest resolution backbone feature map and encoder feature map to generate the pixel embedding map. Did you try including other feature maps with lower resolution (backbone or encoder) and find any increase in performance?

Thanks, Owen

https://github.com/IDEA-Research/MaskDINO/blob/76c8e4536ad8f01ed97f71fe47dd05518b5dbdaf/maskdino/modeling/pixel_decoder/maskdino_encoder.py#L415-L428

FengLi-ust commented 1 year ago

Yes. We use a 1/8 map in the encoder by default. The biggest map we use is 1/4 of the encoder (refer to our 5-scale model). It can improve the performance by around 0.5 AP.

owen24819 commented 1 year ago

Hi, thanks for the response. I was specifically wondering if you fed multiple encoder feature maps to the segmentation head. e.g. fed the 1/4, 1/8 and 1/16 encoder maps to the segmentation head. In the code I highlighted above, it seems like it was written as if you maybe tried this.