NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

use static bdimx & bdimy in inner reduction #3329

Closed liqiangxl closed 6 days ago

liqiangxl commented 2 weeks ago

This PR changes inner reduction scheduler to use static bdimx & bimdy, it saves register usage due to more expr simplifications and slightly improves performance.

No performance change on A100.