Closed 07hyx06 closed 2 years ago
Hi, Thank you for the interest in the work and sorry for the large delay in my response! I have couple deadlines over the next days so will definitely try to get back to you by the end of the week!
Yep that's correct! We first find the centroids by casting attention over the image features (x) and then update the features based on the centroids (K).
According to the pinned issue, the explicit form of duplex attention is:
K = Attention( K, X, X ) or LayerNorm(K+Attention( K, X, X )) X = gamma( Attention( X, K, V ) ) * w(X) + beta( Attention( X, K, V ) ) where Y=( K, V).
Am I right?