deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
MIT License
3.47k stars 143 forks source link

Device-Level Balance Loss and Communication Balance Loss #25

Closed hsm1997 closed 4 months ago

hsm1997 commented 4 months ago

What's the main difference? As I see from your paper, pi' == pi'', and fi' = some_coeff * fi'' maybe fi'' should be: ... (Token t is sent to Device i from Device j where j!=i)

hsm1997 commented 4 months ago

maybe the authors already meant this by using the word "sent"...