Hi, after reading the paper of the state-of-the-art detector DINO, I have one question about the details.
In the Appendix D3 Detailed model components, the paper says: "we find the conditional queries used in DAB-DETR does not suit our model". What does the conditional queries mean? Is it the idea of decoupling the object query to content part and position part (In this paper, you replace the [q_c, q_p] with q_c+q_p)? Or the scale vector (you remove the scale vector for the position encoding)?
Hi, after reading the paper of the state-of-the-art detector DINO, I have one question about the details.
In the Appendix D3
Detailed model components
, the paper says: "we find the conditional queries used in DAB-DETR does not suit our model". What does theconditional queries
mean? Is it the idea of decoupling the object query to content part and position part (In this paper, you replace the [q_c, q_p] with q_c+q_p)? Or the scale vector (you remove the scale vector for the position encoding)?Looking forward to a reply. Thanks in advance!