IDEA-Research / DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Apache License 2.0
2.15k stars 232 forks source link

question about two-stage / self.two_stage_keep_all_tokens #149

Closed owen24819 closed 1 year ago

owen24819 commented 1 year ago

Hi,

Really nice work. I had a question about the two-stage model. I see that you set self.two_stage_keep_all_tokens = False in the cfg file. Did you find that looking at just the topk predicted tokens from the encoder gave better results than looking at all tokens from the encoder?

https://github.com/IDEA-Research/DINO/blob/66d7173cc4167934381a898b07c08507bdd96b63/models/dino/deformable_transformer.py#L408-L417

FengLi-ust commented 1 year ago

Looking at all the tokens (around 10-20k) tokens did not bring obvious gains but can be slower than looking at the topk (900) tokens.

In addition, the decoder only uses 900 queries, so we need to select the top 900 tokens eventually.

owen24819 commented 1 year ago

Ah that makes a lot of sense. I am using much smaller images with 336 tokens so I did not even consider the computational cost. Thanks for the quick response!