Open xingyaoww opened 4 months ago
It looks like the latest Megatron-LM already supports Mixture-of-Experts -- I'd be happy to see that supported (for Mixtral)! I can also help contribute but don't really have too much experience in rebasing Megatron-LLM to upstream..
It looks like the latest Megatron-LM already supports Mixture-of-Experts -- I'd be happy to see that supported (for Mixtral)! I can also help contribute but don't really have too much experience in rebasing Megatron-LLM to upstream..