ROCm / triton

Development repository for the Triton language and compiler
MIT License
96 stars 29 forks source link

Softmax use mask #644

Closed rahulbatra85 closed 2 months ago

rahulbatra85 commented 2 months ago

Do unmasked load/stores in loop iteration except the last one. This results in performance improvement on ROCm

rahulbatra85 commented 2 months ago

Wrong branch