question about the model components in DINO

chensnathan commented 2 years ago

Hi, after reading the paper of the state-of-the-art detector DINO, I have one question about the details.

In the Appendix D3 Detailed model components, the paper says: "we find the conditional queries used in DAB-DETR does not suit our model". What does the conditional queries mean? Is it the idea of decoupling the object query to content part and position part (In this paper, you replace the [q_c, q_p] with q_c+q_p)? Or the scale vector (you remove the scale vector for the position encoding)?

Looking forward to a reply. Thanks in advance!

SlongLiu commented 2 years ago

The conditional query means the latter. We simply remove the positional scales predicted by content queries.

chensnathan commented 2 years ago

Got it! Thanks.

IDEA-Research / DINO

question about the model components in DINO #5