IDEA-Research / DAB-DETR

[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"
Apache License 2.0
501 stars 86 forks source link

A question about attention weight calculation #48

Closed Li-Qingyun closed 2 years ago

Li-Qingyun commented 2 years ago

Hi~ Thanks for your excellent work! I'm confused about an operation about attention weight calculation.

In the implementation of the attention, there is a small modification, which i have not found in the paper.

The code is:

# previous choise of conditional detr and nn.MultiheadAttention
attn_output_weights = softmax(attn_output_weights, dim=-1)

# DAB-DETR modified this line:
attn_output_weights = softmax(attn_output_weights - attn_output_weights.max(dim=-1, keepdim=True)[0], dim=-1)

Whether or not this procedure refers to some previous studies, which i have not been read. Will doing this improve the performance?

SlongLiu commented 2 years ago

It is a trick to stable the training of attention modules. It has little influence on the final performance.

Li-Qingyun commented 2 years ago

It is a trick to stable the training of attention modules. It has little influence on the final performance.

Thanks for your reply.