facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.08k stars 2.37k forks source link

Why Resnet features multiplied by 0.1 in Colab notebook demo? #592

Open MLDeS opened 11 months ago

MLDeS commented 11 months ago

@fmassa

Why are the Resnet features multiplied with 0.1 before adding the spatial positional encodings?

jhairgallardo commented 9 months ago

did you find out an explanation? I am also confused about that 0.1

Edit: I found the answer here The 0.1 rescaling that you are referring to is used to properly scale the contributions of the feature encoding with respect to the position encoding. It is not strictly required, but we have found helpful to apply it.

MLDeS commented 9 months ago

Thanks, @jhairgallardo