ROCm / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
219 stars 51 forks source link

Improve vectorized_elementwise_kernel by increasing vector length. #1549

Closed carlobertolli closed 1 month ago

carlobertolli commented 1 month ago

This patch increments the amount of work assigned per thread in vectorized_elementwise_kernel, thus reducing the number of threadblocks needed to execute the loops while increasing register usage.

Fixes #ISSUE_NUMBER

carlobertolli commented 1 month ago

Rebased against different branch to preserve release 2.4 branch.