NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
10.58k stars 2.37k forks source link

[ENHANCEMENT]How, or rather, is there any support provided for MOE models of Qwen2MoeForCausalLM in the transformers library? #856

Open liangshaopeng opened 5 months ago

liangshaopeng commented 5 months ago

Is your feature request related to a problem? Please describe. I have seen support for training MOE models in Megatron, including scripts for the Mixtral 8x7B model, at: https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/moe.html.

However, at Alibaba, we are interested in training MOE models for Qwen2MoeForCausalLM, supported by Megatron, where the model is based on the architecture code from the transformers library: https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_moe/modeling_qwen2_moe.py.

I wonder if such support is available?

github-actions[bot] commented 3 months ago

Marking as stale. No activity in 60 days.