[DLight] Perf improvement for low_batch_gemv on Metal

apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

https://tvm.apache.org/

Apache License 2.0

11.67k stars 3.45k forks source link

[DLight] Perf improvement for low_batch_gemv on Metal #17026

Closed Hzfengsy closed 4 months ago

Hzfengsy commented 4 months ago

This PR improves the performance of low_batch_gemv on Metal by changing schedule config. The performance improvement is around 2x when bucket larger than 2.