why the result of DINO (ours, Row5+contrastive DN)(47.9) is different from DINO-4scale(49.0)?

IDEA-Research / DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

Apache License 2.0

2.19k stars 243 forks source link

I notice that AP of DINO-4scale using r50 is 49.0% in table 1, while DINO (ours, Row5+contrastive DN) in table 4 is 47.9%. Which setting or model design is modified? It seems the outcome for this project is 47.9%， while mmdetection appears to have a result of 49.0%. I am not sure if the code is different in these projects.

you can consider it as a typo

in fact, a bug about init MSDA weight was contained in early DINO implementation, hence the early result is lower. The authors then found the bug and achieved better performance. They seem forget to modify at some places.

In mmdetection, we followed the new implementation. The code is different, but the theory is the same.

IDEA-Research / DINO

why the result of DINO (ours, Row5+contrastive DN)(47.9) is different from DINO-4scale(49.0)? #197