Closed laisimiao closed 1 year ago
Thank you for your interest. It's common practice in hierarchical vision transformers that use local attention with relative positional biases not to use absolute positional encoding (i.e. Swin), and we simply followed that idea.
Closing this due to inactivity. If you still have questions feel free to open it back up.
Thank you for your interest. It's common practice in hierarchical vision transformers that use local attention with relative positional biases not to use absolute positional encoding (i.e. Swin), and we simply followed that idea.