Amshaker / SwiftFormer

[ICCV'23] Official repository of paper SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
236 stars 25 forks source link

Question about the softmax in EfficientAdditiveAttnetion #4

Closed yaozengwei closed 1 year ago

yaozengwei commented 1 year ago

I doubt there is a mismatching between the code and the paper. For line 175, the shape of A is (B, H*W, 1). I think it should be A = A.softmax(dim=1), so that the softmax operation is applied over the spatial dimension (i.e., H*W). Please correct me If I'm mistaken.

https://github.com/Amshaker/SwiftFormer/blob/075daf69f8959052dfaf7a1e537009304a17f9ce/models/swiftformer.py#L172-L177

Amshaker commented 1 year ago

Hi @yaozengwei ,

Yes, you are right. The softmax should be in the second dimension, this is an implementation issue. Also, you can replace A=A.softmax(dim=-1) by A=A.softmax(dim=1) or A=torch.nn.functional.normalize(A, dim=1).

I will update the model weights and code soon with this change.

Best regards, Abdelrahman.

yaozengwei commented 1 year ago

Thanks for your quick response!