Closed aa200647963 closed 2 years ago
Thank you for the question. The short answer is yes. We only use bbox features for two-stage methods. We did try to use mask features in the following two ways:
We found both of the practices can only bring negligible improvement, so we do not include them eventually.
@LilyDaytoy Hi Wenxuan, could you push a branch with mask-aware ROI Align, which we tried before? The branch is only for participants' reference.
@aa200647963 Hi, for using mask features in PSG 2-stage pipeline, you can check my branch https://github.com/LilyDaytoy/OpenPSG/tree/mask_roi for reference :)
For details of how masked ROI feats are extracted, you can check roi_forward_with_mask
function in openpsg/models/roi_extractors/visual_spatial.py
. I basically masked the feature maps with binary masks of all objects and stack them together, then reuse RoIAlign module in mmdet to crop out the "masked feature maps" with corresponding rois.
The final return of roi_forward_with_mask
would be of 512 channels with 256 channels of bounding box features and 256 channels of mask features stacking together.
To enable mask features in the pipeline, you can check configs/motifs/panoptic_fpn_r50_fpn_1x_predcls_psg.py
for example. (require_masked_feats=True; in_channels=512 in bbox_roi_extractor
)
There are also some detailed modifications in unifying masks size and type, you can check all the modifications here https://github.com/LilyDaytoy/OpenPSG/compare/Jingkang50:OpenPSG:main...mask_roi
This branch has not been tested yet ;) but hope this could help
In the config file, `with_visual_mask=False'. So, only the Transformer-based methods use the mask features?