Support Mixture of Expert (MoE) Models

exo-explore / exo

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

GNU General Public License v3.0

10.97k stars 639 forks source link

Support Mixture of Expert (MoE) Models #32

Open AlexCheema opened 3 months ago

AlexCheema commented 3 months ago

mzbac commented 3 months ago

I looked at this yesterday, would be great exo can support the deepseek v2, it should be very similar to the llama sharding in the DeepseekV2DecoderLayer. But maybe worth trying model parallelism -> https://github.com/ml-explore/mlx-examples/pull/890

345ishaan commented 3 months ago

will like to work on this :)

AlexCheema commented 3 months ago

will like to work on this :)

@345ishaan that would be great - go for it

mintisan commented 3 months ago

Indeed, MoE is the most suitable application scenario for exo and should be prioritized for implementation. Really looking forward to it

youmego commented 3 months ago

looking forward to support MoE deepseek v2 total:236B active:21B +--------------------------+---------------+---------------------+-----------------+-------------------+ | Model | #Total Params | #Activated Params | Context Length | Download | +--------------------------+---------------+---------------------+-----------------+-------------------+ | DeepSeek-V2 | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-V2-Chat (RL) | 236B | 21B | 128k | 🤗 HuggingFace | +--------------------------+---------------+------------------ -+-----------------+------------------+

345ishaan commented 3 months ago

looking forward to support MoE deepseek v2 total:236B active:21B +--------------------------+---------------+---------------------+-----------------+-------------------+ | Model | #Total Params | #Activated Params | Context Length | Download | +--------------------------+---------------+---------------------+-----------------+-------------------+ | DeepSeek-V2 | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-V2-Chat (RL) | 236B | 21B | 128k | 🤗 HuggingFace | +--------------------------+---------------+------------------ -+-----------------+------------------+

yeah i was planning to experiment the setup with https://github.com/deepseek-ai/DeepSeek-Coder-V2 . Will be looking into it this weekend.