PaddlePaddle / PaddleCustomDevice

PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
Apache License 2.0
70 stars 148 forks source link

[NPU] slice & split aclnn kernel add enviroment variable #1300

Closed xuanyuanminzheng closed 4 months ago

xuanyuanminzheng commented 4 months ago
  1. The speed of using the aclnn path by the slice kernel is fast in the repeated compilation scenario, and the aclop speed is fast in the non-compilation scenario. Environment variables can be selected. For LLM, set this parameter to False(default). For a small model, set this parameter to export FLAGS_npu_scale_aclnn=True.
  2. The scale kernel has a precision problem that is not located in the LLM scenario. Use environment variables to avoid this problem. Use export FLAGS_npu_split_aclnn=True for Small Models
paddle-bot[bot] commented 4 months ago

Thanks for your contribution!