Jingkang50 / OpenPSG

Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22
https://psgdataset.org
MIT License
422 stars 69 forks source link

Motifs、VCTree don't use the mask features? #8

Closed aa200647963 closed 2 years ago

aa200647963 commented 2 years ago

In the config file, `with_visual_mask=False'. So, only the Transformer-based methods use the mask features?

Jingkang50 commented 2 years ago

Thank you for the question. The short answer is yes. We only use bbox features for two-stage methods. We did try to use mask features in the following two ways:

We found both of the practices can only bring negligible improvement, so we do not include them eventually.

Jingkang50 commented 2 years ago

@LilyDaytoy Hi Wenxuan, could you push a branch with mask-aware ROI Align, which we tried before? The branch is only for participants' reference.

LilyDaytoy commented 2 years ago

@aa200647963 Hi, for using mask features in PSG 2-stage pipeline, you can check my branch https://github.com/LilyDaytoy/OpenPSG/tree/mask_roi for reference :)

For details of how masked ROI feats are extracted, you can check roi_forward_with_mask function in openpsg/models/roi_extractors/visual_spatial.py. I basically masked the feature maps with binary masks of all objects and stack them together, then reuse RoIAlign module in mmdet to crop out the "masked feature maps" with corresponding rois. The final return of roi_forward_with_mask would be of 512 channels with 256 channels of bounding box features and 256 channels of mask features stacking together. To enable mask features in the pipeline, you can check configs/motifs/panoptic_fpn_r50_fpn_1x_predcls_psg.py for example. (require_masked_feats=True; in_channels=512 in bbox_roi_extractor)

There are also some detailed modifications in unifying masks size and type, you can check all the modifications here https://github.com/LilyDaytoy/OpenPSG/compare/Jingkang50:OpenPSG:main...mask_roi

This branch has not been tested yet ;) but hope this could help