removing the necessity for predicting 2D anchor points and sampling offsets 这句话我不太理解

SysCV / cascade-detr

[ICCV'23] Cascade-DETR: Delving into High-Quality Universal Object Detection

https://arxiv.org/abs/2307.11035

Apache License 2.0

93 stars 4 forks source link

removing the necessity for predicting 2D anchor points and sampling offsets 这句话我不太理解 #9

Closed yuanqianguang closed 6 months ago

yuanqianguang commented 6 months ago

您好，在您的文章中，该方法应用于 DN-DETR 和 DINO 的时候，“Cascade- DINO outperforms DINO by 1.0 AP75 with a much simpler attention design, removing the necessity for predicting 2D anchor points and sampling offsets.” 这句话我不太理解。根据代码，方法中使用的还是显式化的坐标然后通过 hs 映射出 offset，与 reference point 相加的方法，为什么会说是无需再进行预测 2D 锚点和偏移量采集？还望解答，谢谢

yuanqianguang commented 6 months ago

啊=-=懂了，是因为用 cascade attention 替换了 deformable attention 所以才有了这句话