Inquiry about Key/Value Storage and Matrix Merging in DeepSeekerV2 Inference Code

Dear DeepSeekerV2 team,

First of all, I would like to thank you for your incredible work on DeepSeekerV2. I am very interested in the model and have been exploring it in detail. However, I have a couple of questions related to the implementation of your inference process.

In the paper, you mentioned that during inference, the compressed latent vectors for keys and values (ct^kv) are stored. However, when I checked the HuggingFace code implementation, I noticed that _Key_states and Valuestates are still being saved separately during inference. Could you clarify how this aligns with the approach mentioned in the paper?

Additionally, the paper discusses merging W^UV into WO and W^UK into WQ for efficiency. However, I couldn't locate this merging process in the code either. Could you provide some insights or point me in the right direction on how this is implemented?

Thank you again for your fantastic work, and I really look forward to your guidance on these points.

Best regards, lucas

deepseek-ai / DeepSeek-V2

Inquiry about Key/Value Storage and Matrix Merging in DeepSeekerV2 Inference Code #92