Open MLDeS opened 11 months ago
did you find out an explanation? I am also confused about that 0.1
Edit: I found the answer here The 0.1 rescaling that you are referring to is used to properly scale the contributions of the feature encoding with respect to the position encoding. It is not strictly required, but we have found helpful to apply it.
Thanks, @jhairgallardo
@fmassa
Why are the Resnet features multiplied with 0.1 before adding the spatial positional encodings?