Atten4Vis / MS-DETR

[CVPR 2024] The official implementation for "MS-DETR: Efficient DETR Training with Mixed Supervision"
Apache License 2.0
78 stars 4 forks source link

Problems with code logic #9

Open Code-of-Liujie opened 3 months ago

Code-of-Liujie commented 3 months ago
  if self.two_stage:
        output_memory, output_proposals = self.gen_encoder_output_proposals(memory, mask_flatten, spatial_shapes)

        # hack implementation for two-stage Deformable DETR
        enc_outputs_class = self.decoder.class_embed[self.decoder.num_layers](output_memory)
        enc_outputs_coord_unact = self.decoder.bbox_embed[self.decoder.num_layers](output_memory) + output_proposals

        topk = self.two_stage_num_proposals
        topk_proposals = torch.topk(enc_outputs_class[..., 0], topk, dim=1)[1]
        topk_coords_unact = torch.gather(enc_outputs_coord_unact, 1, topk_proposals.unsqueeze(-1).repeat(1, 1, 4))
        topk_coords_unact = topk_coords_unact.detach()
        reference_points = topk_coords_unact.sigmoid()
        init_reference_out = reference_points
        pos_trans_out = self.pos_trans_norm(self.pos_trans(self.get_proposal_pos_embed(topk_coords_unact)))
        query_embed, tgt = torch.split(pos_trans_out, c, dim=2)
    else:
        query_embed, tgt = torch.split(query_embed, c, dim=1)
        query_embed = query_embed.unsqueeze(0).expand(bs, -1, -1)
        tgt = tgt.unsqueeze(0).expand(bs, -1, -1)
        reference_points = self.reference_points(query_embed).sigmoid()
        init_reference_out = reference_points

    # decoder
    hs, inter_references = self.decoder(tgt, reference_points, memory,
                                        spatial_shapes, level_start_index, valid_ratios, query_embed, mask_flatten)

    inter_references_out = inter_references
    if self.two_stage:
        return hs, init_reference_out, inter_references_out, enc_outputs_class, enc_outputs_coord_unact, output_proposals.sigmoid()
    return hs, init_reference_out, inter_references_out, None, None, output_proposals.sigmoid()
Code-of-Liujie commented 3 months ago

image

jacksonsc007 commented 3 months ago

"Two_stage" is the default training setting. The problem you mentioned is due to the lack of "--two_stage" in your training script.

If you want to stick with the "non two_stage" setting, just modify output_proposals.sigmoid() to None