I am attempting to reproduce your theory on a general attention mechanism, specifically by replacing softmax with flatten. However, I am having difficulty understanding the improvements made in Swin Transformer (SwinT) and Pyramid Vision Transformer (PVT). Can you provide a common implementation form.
thanks a lot!
I am attempting to reproduce your theory on a general attention mechanism, specifically by replacing softmax with flatten. However, I am having difficulty understanding the improvements made in Swin Transformer (SwinT) and Pyramid Vision Transformer (PVT). Can you provide a common implementation form. thanks a lot!