deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
MIT License
3.47k stars 143 forks source link

Add MoE offloading strategy? #28

Open Minami-su opened 4 months ago

Minami-su commented 4 months ago

https://arxiv.org/abs/2312.17238