PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
21.63k stars 5.44k forks source link

[XPU] fix performance degradation of sharding_stage1_overlap #63920

Closed lj970926 closed 1 week ago

lj970926 commented 1 week ago

PR Category

Performance Optimization

PR Types

Performance

Description

  1. support bf16 for flatten and fill_any
  2. adapt TensorSetConstantXPU to use xpu::constant instead of H2D
  3. refine test for fill_any
paddle-bot[bot] commented 1 week ago

你的PR提交成功,感谢你对开源项目的贡献! 请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。 Your PR has been submitted. Thanks for your contribution! Please wait for the result of CI firstly. See Paddle CI Manual for details.