It seems that the implementation of Channel only self attention and Spatial only self attention change each other.

DeLightCMU / PSA

This is an official implementation of "Polarized Self-Attention: Towards High-quality Pixel-wise Regression"

Apache License 2.0

240 stars 31 forks source link

It seems that the implementation of Channel only self attention and Spatial only self attention change each other. #5

Open developer0hye opened 2 years ago

developer0hye commented 2 years ago

Thanks for sharing your work!

https://github.com/DeLightCMU/PSA/blob/588b370d9f240d38832061a70c275cb6eb81232e/semantic-segmentation/network/PSA.py#L64-L95

It seems that spatial_pool function is the same with Channel-only self attention module.

khoshsirat commented 2 years ago

Can you explain more what do you mean?

developer0hye commented 2 years ago

@khoshsirat

Does spatial pool function mean Channel-only Self-Attention?

Does channel pool function mean Spatail-only Self-Attention?

khoshsirat commented 2 years ago

OK, I see it now: The spatial_pool function should be renamed to channel_pool and the channel_pool function should be renamed to spatial_pool.

I have found another discrepancy too: In the channel_pool function (which should be renamed to spatial_pool), softmax is called after matmul. But in the paper, in the Spatial-only Self-attention block, softmax is used before matmul.

by-seong-me commented 2 years ago

@khoshsirat You are right The location of softmax operation in channel_pool function is different with paper explanation. What's going on? Which one is correct?

chunchet-ng commented 1 year ago

Hi guys, I have created a gist to compare this implementation against External-Attention-pytorch's. Through the simple test case, I found that the outputs are different with kaiming init.

Any idea why?