fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Apache License 2.0
3.25k stars 523 forks source link

Initialization for bias parameters of the attention weights seems to be wrong #44

Open chrisway613 opened 3 years ago

chrisway613 commented 3 years ago

Reference to the paper's decription: 'Bias parameters of the linear projection are initialized to make A(mlqk) = 1/LK'

But in the code implementation: https://github.com/fundamentalvision/Deformable-DETR/blob/11169a60c33333af00a4849f1808023eba96a931/models/ops/modules/ms_deform_attn.py#L72 we can see it was actually initialized to 0!

Due to the weight parameters are also initialized to 0, this will make the finally output becomes all zero.. Am I wrong.. so confused~

lmc8133 commented 2 years ago

Have you ever run the code after your modify? What's the difference in the performance?

Xuer0313 commented 2 years ago

https://zhuanlan.zhihu.com/p/378418174 Hope it will help you

Dstudying commented 1 year ago

在知乎看到你的专栏,对于这个问题,我在代码里仔细看看了。实际上这里只是简单的初始化。
https://github.com/fundamentalvision/Deformable-DETR/blob/11169a60c33333af00a4849f1808023eba96a931/models/ops/modules/ms_deform_attn.py#L71-L72 而在forward中,这里才使得A(mlqk) = 1/LK。 https://github.com/fundamentalvision/Deformable-DETR/blob/11169a60c33333af00a4849f1808023eba96a931/models/ops/modules/ms_deform_attn.py#L99-L100