I'm curious about the insights behind adding positional embedding to the q and k, but not to the v in both self and cross-attention; is the positional embedding added in each attention block, and if so, why? Looking forward to further insights, and thank you in advance!
https://github.com/Junelin2333/LanGuideMedSeg-MICCAI2023/blob/c96a272bc1b49c55b27696ebdeb7a6e93ac62e29/utils/layers.py#L70C8-L81C46
I'm curious about the insights behind adding positional embedding to the q and k, but not to the v in both self and cross-attention; is the positional embedding added in each attention block, and if so, why? Looking forward to further insights, and thank you in advance!