deepseek-ai / DeepSeek-MoE

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
MIT License
982 stars 48 forks source link

训练MoE的时候会出现loss = 0 的情况 #39

Open AlenjandroWang opened 2 months ago

AlenjandroWang commented 2 months ago

![Uploading QQ截图20240801000702.png…]()