Open ChiefGodMan opened 1 week ago
We believe that due to the feature blending caused by the gate function, the order of the positional and feature information at the beginning and end does not matter. Ultimately, as long as there is a consistent direction in the loss, the model can effectively utilize context and positional information.
In your opinion, the value and function between query and query_pos are equivalent ?
The two tensors mask_aware_query_feat, mask_aware_query_pos are from PositionalQueryGenerator(): https://github.com/SehwanChoi0307/Mask2Map/blob/6281f3592503ecf74450d19a30b0e29889ce335f/projects/mmdet3d_plugin/mask2map/modules/transformer_2p.py#L524
pts_query_feat and pts_query_pos are split from same embedding weight, but mask_aware_query_feat_pia and mask_aware_query_pos are far away from different modules. The former is transformed from multi layers Mask2FormerTransformerDecoderLayer, the latter is transformed from self.decoder_positional_encoding, then multiplied by sigmoid mask_pred value. https://github.com/SehwanChoi0307/Mask2Map/blob/6281f3592503ecf74450d19a30b0e29889ce335f/projects/mmdet3d_plugin/mask2map/modules/transformer_2p.py#L498
Finally, in MaskGuidedMapDecoder() function, we use the concatenated mask_aware_query += self.out_proj(query) . Yes, the original order of (query, query_pos) always keep. https://github.com/SehwanChoi0307/Mask2Map/blob/6281f3592503ecf74450d19a30b0e29889ce335f/projects/mmdet3d_plugin/mask2map/modules/transformer_2p.py#L749
Offcause if we change query = mask_aware_query + self.out_proj(attn_output + gate * (self.lin_self(mask_aware_query_norm) - attn_output))
to query = self.out_proj(attn_output + gate * (self.lin_self(mask_aware_query_norm) - attn_output))
, then gate tensor may transform its orders.
In GeometricFeatureExtractor() function, the concatenation order is (query, pos), the code line is bellow: https://github.com/SehwanChoi0307/Mask2Map/blob/6281f3592503ecf74450d19a30b0e29889ce335f/projects/mmdet3d_plugin/mask2map/modules/transformer_2p.py#L590 . However, in MaskGuidedMapDecoder() function, the split order is (pos, query), the code line is bellow: https://github.com/SehwanChoi0307/Mask2Map/blob/6281f3592503ecf74450d19a30b0e29889ce335f/projects/mmdet3d_plugin/mask2map/modules/transformer_2p.py#L752 . Why?