Question about the pseudo code of the LSDA.

Thanks for your excellent work!

I have a question about the pseudo code of the LSDA which was implemented with only ten lines of code, and only reshape and permute operations are used:

if type == "SDA": x = x.reshaspe(H // G, G, W // G, G, D).permute(0, 2, 1, 3, 4) elif type == "LDA": x = x.reshaspe(G, H // G, G, W // G, D).permute(1, 3, 0, 2, 4)

Although they do have difference in the way of reshaping, I still have question about the reason for this special design. Can you explain from another perspective why these two different design results correspond to different attention (Long or Short ) implementations?

Thanks a lot !

cheerss / CrossFormer

Question about the pseudo code of the LSDA. #18