I have a question about the pseudo code of the LSDA which was implemented with only ten lines of code, and only reshape
and permute operations are used:
if type == "SDA":
x = x.reshaspe(H // G, G, W // G, G, D).permute(0, 2, 1, 3, 4)
elif type == "LDA":
x = x.reshaspe(G, H // G, G, W // G, D).permute(1, 3, 0, 2, 4)
Although they do have difference in the way of reshaping, I still have question about the reason for this special design. Can you explain from another perspective why these two different design results correspond to different attention (Long or Short ) implementations?
Thanks for your excellent work!
I have a question about the pseudo code of the LSDA which was implemented with only ten lines of code, and only reshape and permute operations are used:
if type == "SDA": x = x.reshaspe(H // G, G, W // G, G, D).permute(0, 2, 1, 3, 4) elif type == "LDA": x = x.reshaspe(G, H // G, G, W // G, D).permute(1, 3, 0, 2, 4)
Although they do have difference in the way of reshaping, I still have question about the reason for this special design. Can you explain from another perspective why these two different design results correspond to different attention (Long or Short ) implementations?
Thanks a lot !