PaddlePaddle / PaddleCustomDevice

PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
Apache License 2.0
68 stars 142 forks source link

[NPU] Transpose before gather to speed up #1212

Closed Birdylx closed 4 months ago

Birdylx commented 4 months ago

gather in the last axis is too slow, tranpose before gather can speed up significantly

paddle-bot[bot] commented 4 months ago

Thanks for your contribution!