No position encoding? Could you explain some your thoughts?

SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022

MIT License

1.04k stars 85 forks source link

No position encoding? Could you explain some your thoughts? #54

Closed laisimiao closed 1 year ago

alihassanijr commented 2 years ago

Thank you for your interest. It's common practice in hierarchical vision transformers that use local attention with relative positional biases not to use absolute positional encoding (i.e. Swin), and we simply followed that idea.

alihassanijr commented 1 year ago

Closing this due to inactivity. If you still have questions feel free to open it back up.