issues
search
deepseek-ai
/
DeepSeek-MoE
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
MIT License
982
stars
48
forks
source link
训练MoE的时候会出现loss = 0 的情况
#39
Open
AlenjandroWang
opened
2 months ago
AlenjandroWang
commented
2 months ago
![Uploading QQ截图20240801000702.png…]()
![Uploading QQ截图20240801000702.png…]()