ROCm / triton

Development repository for the Triton language and compiler
MIT License
80 stars 22 forks source link

[DotSlicing] Do not change the frequency of sliced ops. #531

Open htyu opened 4 months ago

htyu commented 4 months ago

Previously sliced ops were placed next to a sliced dot operation. This may change the frequency of sliced ops if they are not in the same block with the dot op. For example, a flash attention kernel has Q tensor loaded outside the dot loop. The dot slicing should better not move the sliced loads inside the loop. This change fixes that by placing the sliced ops next to the original op for such out-place ops. While we may lose the benefit of instruction recording for ops in different blocks of the same loop, I hope a later reordering pass can get it.