LeapLabTHU / Agent-Attention

Official repository of Agent Attention (ECCV2024)
473 stars 35 forks source link

为什么使用A代理K呢? #35

Closed WateverOk closed 1 month ago

WateverOk commented 3 months ago

您好,在阅读完您的论文之后,有个疑惑想要请教一下:在使用A代替Q与K和V计算得到Va之后,为什么还要使用A来代替K而不是用Va来代替K呢?A与Va并不完全相同,那么使用Q和A计算得到的注意力分数可以作用到Va上面吗?

tian-qing001 commented 2 months ago

Hi @WateverOk, thanks for your insightful question. Intuitively, the agent tokens $A$ align with the query $Q$ space, while the agent features $V_A$ align with the value $V$ space. Therefore, in agent broadcast step, we use $A$ to act as keys (since they are in the same space as $Q$) to broadcast global information to each query.