Hello, I found that you used ChannelAttention and SpatialAttention in your code to replace the cross attention used by the cross attention layer mentioned in original paper, which was done to take into account the computational cost of cross-attention?
Hello, I found that you used ChannelAttention and SpatialAttention in your code to replace the cross attention used by the cross attention layer mentioned in original paper, which was done to take into account the computational cost of cross-attention?