Are there potential errors in the implementation of SA?

jialeli1 commented 3 years ago

Hi.

As here, the attention matrix should be transposed before the matrix product, if I understand it correctly.

Here is my draft of the calculation about the dimension of the matrix product. sa_20210416112006

MenghaoGuo commented 3 years ago

Hi, Good question. I do not think it is wrong and please pay attention to the dimension of normalization which is different from original self-attention.

JunweiZheng93 commented 2 years ago

I think @jialeli1 is right. If you don't transpose the attention matrix before the matrix product, the matrix product makes no sense (pay attention to the meaning of each dimension). And I guess because the author didn't transpose the attention matrix, he needed to do the normalization proposed in the paper. However, if you transpose the attention matrix and do the normalization proposed by the original attention paper, you will find the proposed normalization is not necessary. I have re-implemented the segmentation code using PyTorch and got a quite good result.

MaiRajborirug commented 1 year ago

@JunweiZheng93 Could you share your implementation code? Thank you so much

Stronger-Huang commented 11 months ago

@JunweiZheng93 Could you share your implementation code? Thank you so much

MenghaoGuo / PCT

Are there potential errors in the implementation of SA? #17