Closed carlobertolli closed 1 month ago
This patch increments the amount of work assigned per thread in vectorized_elementwise_kernel, thus reducing the number of threadblocks needed to execute the loops while increasing register usage.
Fixes #ISSUE_NUMBER
Rebased against different branch to preserve release 2.4 branch.
This patch increments the amount of work assigned per thread in vectorized_elementwise_kernel, thus reducing the number of threadblocks needed to execute the loops while increasing register usage.
Fixes #ISSUE_NUMBER