What is the purpose of the minus of max attention weight

IDEA-Research / DAB-DETR

[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"

Apache License 2.0

501 stars 86 forks source link

What is the purpose of the minus of max attention weight #8

Closed JosonChan1998 closed 2 years ago

JosonChan1998 commented 2 years ago

Hi, thanks for you nice work!

But I have a confuse about the code.What is the purpose of the minus of max attention weight？

https://github.com/IDEA-opensource/DAB-DETR/blob/9b637396d2d8eea16b39940cde8e7d34262cb2e2/models/DAB_DETR/attention.py#L381-L382

Looking for your reply!

SlongLiu commented 2 years ago

Thanks for your question. It is a trick to stable the training but not affect the final results.

JosonChan1998 commented 2 years ago

Thanks for your reply.

I want to ask another question about the affect of query pos in decoder layer. In the decoder layer, the query pos is used for self attn and cross attn. So have you done an ablation study about the affect of the query pos just in self attn or cross attn?

SlongLiu commented 2 years ago

I did not do some ablations about the query pos in decoder layers, including both self-attn and cross-attn. However, as our paper shows the importance of the formulation of query pos, I think they have a large impact on the final performance.

JosonChan1998 commented 2 years ago

ok! Thanks for your relpy!