topk_proposals = torch.topk(enc_outputs_class[..., 0], topk, dim=1)[1]

fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

Apache License 2.0

3.24k stars 523 forks source link

topk_proposals = torch.topk(enc_outputs_class[..., 0], topk, dim=1)[1] #79

Open Wangzs0228 opened 3 years ago

Wangzs0228 commented 3 years ago

why 0 in this code?

superaha commented 3 years ago

Same question here.

This is from the original paper. However, it seems that such logic is not implemented in current code base.

In the first stage, given the output feature maps of the encoder, a detection head is applied to each pixel. The detection head is of a 3-layer FFN for bounding box regression, and a linear projection for bounding box binary classification (i.e., foreground and background), respectively.

Christinepan881 commented 2 years ago

Same question. It seems that the num_classes in self.decoder.class_embed is the same as the whole dataset classes

volcanolee4 commented 2 years ago

Although they use sigmoid+BCE loss, I doubt if it is reasonable to judge a foreground only by the score in the first category.