Open meteorlin opened 3 months ago
I have the same question here. In the open-source implementation on huggingface, k still has multiple heads, and k,v still be saved during inference, which is completely different from the statements in the architecture part.
作者好!我看了您在HuggingFace上开源的代码,其中的注意力部分似乎没有实现论文中提到的Q、K映射矩阵合并(absorbed),想请教下这块内容具体是在哪进行了等效实现?