QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
6.51k stars 373 forks source link

能不能针对qwen2-moe提供一个modeling_qwen2_moe.py的megatron转化特供版 #687

Open steins048596 opened 1 month ago

steins048596 commented 1 month ago

megatron_moe的是对route_logits实现是先topk,再softmax;贵团队的modeling_qwen2_moe.py中的route_logits是先softmax,再topk,然后有个参数norm_topk_prob控制是否再进行归一化。 在qwen2-moe中的norm_topk_prob是false,会导致megatron转化来的router_logits量级不对(megatron的大概在0.25,hf的在0.0几),后面全都乱了;给一个注释提示下转化来的权重需要将norm_topk_prob打开

++++++++++++++++++++++++ 有一点相关疑问,注意到qwen-moe的系列的share_expert_gate为share_expert提供了可学习的参数,以调节share_expert的输出占比;一点猜想,就是因为先sofxmax,后topk的做法,对router_expert的输出占比控制不佳,所以才需要调节share_expert以平衡呢?

jklj077 commented 1 month ago

@bozheng-hit

github-actions[bot] commented 5 days ago

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.