Open peterant330 opened 1 year ago
sampling_offsets.bias
is not frozen during training, because the no_grad
here will not take effect.
About the initialization, in simple terms, this initialization is to place the sampling points on the circumference around the quiry point.
You can watch this video for more information about deformable attention.
sampling_offsets.bias
is not frozen during training, because theno_grad
here will not take effect.About the initialization, in simple terms, this initialization is to place the sampling points on the circumference around the quiry point.
You can watch this video for more information about deformable attention.
Thanks for your explanation. I guess you want to make the sampling points to form a circle around the query. However, I don't understand why the length of thetas is n_heads rather than n_points, and what is the function of the for loop. If you only have one head but multiple sampling points, then I guess you will have n points that form a line starting from the reference point.
Hi, This is really cool work. But I have some difficulties to understand these code:
https://github.com/czczup/ViT-Adapter/blob/968f6b008bdc4f84e2a637c986acc139b38e8083/detection/ops/modules/ms_deform_attn.py#L66-L72
I am curious about the mechanism behind how you initialize sampling_offsets.bias and why it is frozen during the training.