Closed liqiangxl closed 6 days ago
This PR changes inner reduction scheduler to use static bdimx & bimdy, it saves register usage due to more expr simplifications and slightly improves performance.
No performance change on A100.
This PR changes inner reduction scheduler to use static bdimx & bimdy, it saves register usage due to more expr simplifications and slightly improves performance.
No performance change on A100.