PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
21.78k stars 5.47k forks source link

add pir decoder_attention_fuse_pass #65341

Open linkk08 opened 1 week ago

linkk08 commented 1 week ago

需求描述 Feature Description

任务目标: 新增decoder_attention_fuse_pass;
需求场景: decoder模型性能优化;
功能描述: 新增 decoder attention 融合的pir pass, 该pass将q k v 的计算融合成一个大算子运算(不带mask)

替代实现 Alternatives

No response

warrentdrew commented 1 week ago

您好,感谢您的pr,CI通过后联系相关同学帮忙review下