PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
21.64k stars 5.44k forks source link

[Auto Parallel] Support MoE expert parallelism in dygraph auto parallel #63904

Open pkuzyc opened 1 week ago

pkuzyc commented 1 week ago

PR Category

Auto Parallel

PR Types

New features

Description

Pcard-76459 Support MoE expert parallelism in dygraph auto parallel. In auto-parallel expert parallelism, experts' weights have different process meshes. This pr implements the expert parallelism as following: 图片 1

Main changes

  1. Add two apis local_tensor_list_from_dtensor and dtensor_from_local_list to transform the tensors between global and local meshes.
  2. Fix the problems when the input tensors of a op have different mesh, which is necessary in expert parallelism.
paddle-bot[bot] commented 1 week ago

你的PR提交成功,感谢你对开源项目的贡献! 请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。 Your PR has been submitted. Thanks for your contribution! Please wait for the result of CI firstly. See Paddle CI Manual for details.