deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
MIT License
3.47k stars 143 forks source link

How to understand W^UK can be absorbed into W^Q and W^UV can be absorbed into W^O? #27

Closed cc752424640 closed 4 months ago

luofuli commented 4 months ago

Here's a recommended blog for you: https://spaces.ac.cn/archives/10091. @cc752424640