[XPU] fix performance degradation of sharding_stage1_overlap

PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

http://www.paddlepaddle.org/

Apache License 2.0

21.63k stars 5.44k forks source link

[XPU] fix performance degradation of sharding_stage1_overlap #63920

Closed lj970926 closed 1 week ago

lj970926 commented 1 week ago

PR Category

Performance Optimization

PR Types

Performance

Description

support bf16 for flatten and fill_any
adapt TensorSetConstantXPU to use xpu::constant instead of H2D
refine test for fill_any

paddle-bot[bot] commented 1 week ago

你的PR提交成功，感谢你对开源项目的贡献! 请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。 Your PR has been submitted. Thanks for your contribution! Please wait for the result of CI firstly. See Paddle CI Manual for details.