IDEA-Research / DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Apache License 2.0
2.17k stars 234 forks source link

question about the model components in DINO #5

Closed chensnathan closed 2 years ago

chensnathan commented 2 years ago

Hi, after reading the paper of the state-of-the-art detector DINO, I have one question about the details.

In the Appendix D3 Detailed model components, the paper says: "we find the conditional queries used in DAB-DETR does not suit our model". What does the conditional queries mean? Is it the idea of decoupling the object query to content part and position part (In this paper, you replace the [q_c, q_p] with q_c+q_p)? Or the scale vector (you remove the scale vector for the position encoding)?

Looking forward to a reply. Thanks in advance!

SlongLiu commented 2 years ago

The conditional query means the latter. We simply remove the positional scales predicted by content queries.

chensnathan commented 2 years ago

Got it! Thanks.