dvmazur / mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops
MIT License
2.29k stars 225 forks source link

Support DeepSeek V2 model #36

Open Minami-su opened 4 months ago

Minami-su commented 4 months ago

DeepSeek V2 is a state-of-the-art moe model. Are there any plans to support this model?