Here the Query and Key are generated from the input _xprev, and the Value is generated from the input x. However, in general, Query is generated from one of the inputs, and the other input is used to generate the Key and Value, for example: https://vaclavkosar.com/images/cross-attention-in-transformer-architecture.png
Can you please help me understand the idea behind your implementation of cross-attention?
Hi,
I have a question regarding the computation of cross-attention in https://github.com/apple/ml-cvnets/blob/main/cvnets/layers/linear_attention.py#L163
Here the Query and Key are generated from the input _xprev, and the Value is generated from the input x. However, in general, Query is generated from one of the inputs, and the other input is used to generate the Key and Value, for example: https://vaclavkosar.com/images/cross-attention-in-transformer-architecture.png
Can you please help me understand the idea behind your implementation of cross-attention?