about the active param counts of DeepSeek-V2-Lite

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

MIT License

3.47k stars 143 forks source link

about the active param counts of DeepSeek-V2-Lite #73

Open imhmhm opened 2 months ago

imhmhm commented 2 months ago

Hi,

As mentioned both in the paper and the webpage, DeepSeek-V2-Lite has a total of 15.7B params and 2.4B active params. However, the active params in my quick math is a little over 2.6B params while the total params seem to be the same including the word embedding and lm_head. Please point out any mistake I might make in calculating the active params.

Thanks!