Why modulating attention by w&h works?

IDEA-Research / DAB-DETR

[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"

Apache License 2.0

506 stars 87 forks source link

Why modulating attention by w&h works? #49

Open SupetZYK opened 2 years ago

SupetZYK commented 2 years ago

I have some doubts on line https://github.com/IDEA-opensource/DAB-DETR/blob/main/models/DAB_DETR/transformer.py#L242 .

refHW_cond = self.ref_anchor_head(output).sigmoid() # nq, bs, 2

This line asks the model to learn absolute value of w, h from output. But NO supervision is applied. Besides, the 'output' tensor is used to learn the OFFSET of bbox (x, y, w, h).

So, I am wondering whether the model can learn width and height as expected?

SlongLiu commented 2 years ago

The results show that our models get performance gains with the modulated operation.