fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Apache License 2.0
3.14k stars 513 forks source link

fix Top K proposals in two_stage #198

Closed seungyonglee0802 closed 1 year ago

seungyonglee0802 commented 1 year ago

Reference: https://github.com/IDEA-Research/DINO/blob/main/models/dino/deformable_transformer.py#L342

Related to https://github.com/fundamentalvision/Deformable-DETR/issues/79

It's not reasonable to judge a foreground only by the score in the first category (class 0).

seungyonglee0802 commented 1 year ago
        if 'enc_outputs' in outputs:
            enc_outputs = outputs['enc_outputs']
            bin_targets = copy.deepcopy(targets)
            for bt in bin_targets:
                bt['labels'] = torch.zeros_like(bt['labels'])
            indices = self.matcher(enc_outputs, bin_targets)
            for loss in self.losses:
                if loss == 'masks':
                    # Intermediate masks losses are too costly to compute, we ignore them.
                    continue
                kwargs = {}
                if loss == 'labels':
                    # Logging is enabled only for the last layer
                    kwargs['log'] = False
                l_dict = self.get_loss(loss, enc_outputs, bin_targets, indices, num_boxes, **kwargs)
                l_dict = {k + f'_enc': v for k, v in l_dict.items()}
                losses.update(l_dict)

I missed that it matches all the classes as 0 when calculating the loss.