alibaba / ChatLearn

A flexible and efficient training framework for large-scale alignment tasks
Apache License 2.0
216 stars 17 forks source link

[Feature] moe框架的模型为何需要单独支持? #158

Closed yiyepiaoling0715 closed 1 day ago

yiyepiaoling0715 commented 4 days ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

https://github.com/alibaba/ChatLearn/tree/main chatlearn框架,为何moe需要单独支持,如果直接将llama等模型替换为moe模型比如deepseek-v2 进行训练,会有什么问题么

adoda commented 2 days ago

如果你跑一个小规模moe模型,不需要特别的并行策略,这种会比较简单。如果你跑更大规模,训练开启EP,推理不开启,就需要额外支持。