Cross Attention Computation in LinearSelfAttention()

Hi,

I have a question regarding the computation of cross-attention in https://github.com/apple/ml-cvnets/blob/main/cvnets/layers/linear_attention.py#L163

Here the Query and Key are generated from the input _xprev, and the Value is generated from the input x. However, in general, Query is generated from one of the inputs, and the other input is used to generate the Key and Value, for example: https://vaclavkosar.com/images/cross-attention-in-transformer-architecture.png

Can you please help me understand the idea behind your implementation of cross-attention?

apple / ml-cvnets

Cross Attention Computation in LinearSelfAttention() #81