deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
MIT License
3.47k stars 143 forks source link

HuggingFace中开源的代码似乎没有实现矩阵合并 #80

Open meteorlin opened 1 month ago

meteorlin commented 1 month ago

作者好!我看了您在HuggingFace上开源的代码,其中的注意力部分似乎没有实现论文中提到的Q、K映射矩阵合并(absorbed),想请教下这块内容具体是在哪进行了等效实现?

ZxAndJb commented 1 month ago

I have the same question here. In the open-source implementation on huggingface, k still has multiple heads, and k,v still be saved during inference, which is completely different from the statements in the architecture part.