As mentioned both in the paper and the webpage, DeepSeek-V2-Lite has a total of 15.7B params and 2.4B active params. However, the active params in my quick math is a little over 2.6B params while the total params seem to be the same including the word embedding and lm_head. Please point out any mistake I might make in calculating the active params.
Hi,
As mentioned both in the paper and the webpage, DeepSeek-V2-Lite has a total of 15.7B params and 2.4B active params. However, the active params in my quick math is a little over 2.6B params while the total params seem to be the same including the word embedding and lm_head. Please point out any mistake I might make in calculating the active params.
Thanks!