Initialization for bias parameters of the attention weights seems to be wrong

fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

Apache License 2.0

3.25k stars 523 forks source link

Open chrisway613 opened 3 years ago

chrisway613 commented 3 years ago

Reference to the paper's decription: 'Bias parameters of the linear projection are initialized to make A(mlqk) = 1/LK'

Due to the weight parameters are also initialized to 0, this will make the finally output becomes all zero.. Am I wrong.. so confused~

lmc8133 commented 2 years ago

Have you ever run the code after your modify? What's the difference in the performance?

Xuer0313 commented 2 years ago

Dstudying commented 1 year ago