ZhangGongjie / Meta-DETR

[T-PAMI 2022] Meta-DETR for Few-Shot Object Detection: Official PyTorch Implementation
MIT License
383 stars 84 forks source link

Some questions about QSAttn. #76

Open nhw649 opened 11 months ago

nhw649 commented 11 months ago

(1) When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper. (2) Why do QSAttn only on the first layer?

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code
nanfangAlan commented 11 months ago

You could find answers in Sec 4.2 and Table 8.

ZhangGongjie commented 11 months ago

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

nhw649 commented 11 months ago

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper. Cannot be seen from fig. 4.

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code
Yuxuan-W commented 11 months ago

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

Sorry for my careless reading, the comment has been removed for clarification.

ZhangGongjie commented 11 months ago

@Yuxuan-W No problem. Any discussions are welcome.

@nhw649 Do allow me some time to address your concern, as I have a full-time job now, and I haven't read my own paper for a long period of time.

nhw649 commented 11 months ago

@Yuxuan-W No problem. Any discussions are welcome.

@nhw649 Do allow me some time to address your concern, as I have a full-time job now, and I haven't read my own paper for a long period of time.

Okay, please. Thank you.

Yuxuan-W commented 11 months ago

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper. Cannot be seen from fig. 4.

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code

In the DeformableTransformerEncoder, you will find self.QSAttn == True is only for the first layer. However, if you look at the category code computed for the first layer, namely category_code[0], you will find it is not involved in the siamese_attn. In another word, only the category_code[1:] are effected by siamese_attn, but they are never used.

So I guess it's just for convenience in implementation. There's no problem and it is aligned with the paper.

ZhangGongjie commented 10 months ago

I hope Yuxuan-W's comments have addressed your concerns. @nhw649 Thank you @Yuxuan-W !