Open chrisway613 opened 3 years ago
Have you ever run the code after your modify? What's the difference in the performance?
https://zhuanlan.zhihu.com/p/378418174 Hope it will help you
在知乎看到你的专栏,对于这个问题,我在代码里仔细看看了。实际上这里只是简单的初始化。
https://github.com/fundamentalvision/Deformable-DETR/blob/11169a60c33333af00a4849f1808023eba96a931/models/ops/modules/ms_deform_attn.py#L71-L72
而在forward中,这里才使得A(mlqk) = 1/LK。
https://github.com/fundamentalvision/Deformable-DETR/blob/11169a60c33333af00a4849f1808023eba96a931/models/ops/modules/ms_deform_attn.py#L99-L100
Reference to the paper's decription: 'Bias parameters of the linear projection are initialized to make A(mlqk) = 1/LK'
But in the code implementation: https://github.com/fundamentalvision/Deformable-DETR/blob/11169a60c33333af00a4849f1808023eba96a931/models/ops/modules/ms_deform_attn.py#L72 we can see it was actually initialized to 0!
Due to the weight parameters are also initialized to 0, this will make the finally output becomes all zero.. Am I wrong.. so confused~