Difference of MaskDINO implementation between `detrex` and `MaskDINO` repo

YanShuang17 commented 1 year ago

Location: xxx/modeling/pixel_decoder/maskdino_encoder.py, in forward function of MSDeformAttnTransformerEncoderLayer:

MaskDINO implementation:

 def forward(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask=None):
        # self attention
        src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
        src = src + self.dropout1(src2)
        src = self.norm1(src)

        # ffn
        src = self.forward_ffn(src)

        return src

detrex implementation(residual is lost):


def forward(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask=None):
        # self attention
        src2 = self.self_attn(query=src,query_pos=pos, reference_points=reference_points, value=src,spatial_shapes= spatial_shapes, level_start_index=level_start_index, key_padding_mask=padding_mask)
        src =src2
        src = self.norm1(src)

        # ffn
        src = self.forward_ffn(src)

        return src

@HaoZhang534

HaoZhang534 commented 1 year ago

Hi @YanShuang17 , this difference is caused by the different implementations of deformable attention in the original MaskDINO repo and detrex. In detrex, the deformable attention code contains adding positional queries and residual connection.

YanShuang17 commented 1 year ago

OK. got it, thx

IDEA-Research / detrex

Difference of MaskDINO implementation between `detrex` and `MaskDINO` repo #172