It is a little contradictory to the purpose of the 3DMF which is used to filter the "unnecessary" point in the feature. If the original feature is used to form the final output, the output will not be a sparse feature map, and it may bring some adverse effects to the results without NMS processing.
The experiments and visualization show that the feature maps become more sparse. Even so, our solution is of course not the perfect one. It's always welcomed to bring up new improvements.
The filtered feature after 3DMF is added by the original one to form the final output:
https://github.com/Megvii-BaseDetection/DeFCN/blob/a82393e290455fd11a1a088723a8050791b44c15/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms.3dmf/fcos.py#L607
It is a little contradictory to the purpose of the 3DMF which is used to filter the "unnecessary" point in the feature. If the original feature is used to form the final output, the output will not be a sparse feature map, and it may bring some adverse effects to the results without NMS processing.