apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.67k stars 3.45k forks source link

[DLight] Perf improvement for low_batch_gemv on Metal #17026

Closed Hzfengsy closed 4 months ago

Hzfengsy commented 4 months ago

This PR improves the performance of low_batch_gemv on Metal by changing schedule config. The performance improvement is around 2x when bucket larger than 2.