InternLM / InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
https://internevo.readthedocs.io/zh-cn/latest/?badge=latest
Apache License 2.0
310 stars 52 forks source link

feat(moe): support group mlp for moe #345

Closed blankde closed 1 month ago

blankde commented 2 months ago

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

  1. impl GroupedFeedForward and use GroupedFeedForward to impl experts
  2. add communicator for GroupedxxxLinear in pipeline.py
  3. add ckpt for wp and impl the convert script between group experts and sequence expert list

BC-breaking (Optional)

Does the modification introduce changes that break the backward compatibility of the downstream repositories? If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here and update the documentation.

Checklist

Before PR:

After PR: