在cinn前执行AutoMixedPrecisionPass后layer_norm的三个输入会添加cast算子转换成fp16,同时三个输出也变成了fp16。而组合算子拆分后的逻辑中后两个输出是fp32,导致check_decomp_outputs中检查错误
PreconditionNotMetError: [Prim] For op pd_op.layer_norm, its origin 1-index output dtype float16 is not equal to decomp output dtype float32
Pcard-67164
你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.
PR Category
Performance Optimization
PR Types
Performance
Description
在cinn前执行AutoMixedPrecisionPass后layer_norm的三个输入会添加cast算子转换成fp16,同时三个输出也变成了fp16。而组合算子拆分后的逻辑中后两个输出是fp32,导致check_decomp_outputs中检查错误 PreconditionNotMetError: [Prim] For op pd_op.layer_norm, its origin 1-index output dtype float16 is not equal to decomp output dtype float32 Pcard-67164