Atten4Vis / MS-DETR

[CVPR 2024] The official implementation for "MS-DETR: Efficient DETR Training with Mixed Supervision"
Apache License 2.0
80 stars 5 forks source link

Hello author, may I ask if the loss function (class SetCriterion) has a Boolean variable in self.indices_merge. If the variable is Ture, run the following code: "def indices_merge(num_queries, o2o_indices, o2m_indices): #16

Open 2308700388 opened 2 months ago

2308700388 commented 2 months ago

Hello author, may I ask if the loss function (class SetCriterion) has a Boolean variable in self.indices_merge. If the variable is Ture, run the following code: "def indices_merge(num_queries, o2o_indices, o2m_indices): bs = len(o2o_indices) temp_indices = torch.zeros(bs, num_queries, dtype=torch.int64).cuda() - 1 new_one2many_indices = []

for i in range(bs): one2many_fg_inds = o2m_indices[i][0].cuda() one2many_gt_inds = o2m_indices[i][1].cuda() one2one_fg_inds = o2o_indices[i][0].cuda() one2one_gt_inds = o2o_indices[i][1].cuda() temp_indices[i][one2one_fg_inds] = one2one_gt_inds temp_indices[i][one2many_fg_inds] = one2many_gt_inds fg_inds = torch.nonzero(temp_indices[i] >= 0).squeeze(1)

fg_inds = torch.argwhere(temp_indices[i] >= 0).squeeze(1)

        gt_inds = temp_indices[i][fg_inds]
        new_one2many_indices.append((fg_inds, gt_inds))

return new_one2many_indices", what is the use of this code for hybrid supervision

ZhaoChuyang commented 1 month ago

Hi,

The indices merging, as introduced in our paper, is designed to incorporate the query obtained from one-to-one (o2o) matching into the o2m matching query set for each ground truth. This approach provides a slight performance gain (around +0.2 mAP).

The purpose of this code snippet is to ensure that the results from o2o matching are prioritized over those from one-to-many (o2m) matching, allowing the latter assignments to overwrite the earlier o2m assignments when necessary.

However, the impact of indices merging on final performance is minimal, and in most cases you can achieve comparable results without using it.