ROCm / triton

Development repository for the Triton language and compiler
MIT License
86 stars 27 forks source link

[FA] Add FA tutorial with transV #385

Closed zhanglx13 closed 10 months ago

zhanglx13 commented 10 months ago

Make V tensor k-major so that we don't need to suffer from the transposition issue when ds_reading from LDS for V.
This PR adds a new version of fused-attention tutorial so that the original one is not "polluted".

Performance: T: causal=True F: causal=False

  triton-mlir transV speedup
D64 F 100 105.1 1.05
D64 F 108 113 1.05
D64 F 107 112.8 1.05
D64 F 108 113.7 1.05
D64 F 109 114.2 1.05
D128 F 93 100 1.08
D128 F 103 112.79 1.10
D128 F 109 119.18 1.09
D128 F 112 122.13 1.09
D128 F 113 124.16 1.10
D64 T 58 65.4 1.13
D64 T 79 87.6 1.11
D64 T 92 100.4 1.09
D64 T 98 106.1 1.08
D64 T 101 109.1 1.08
D128 T 47 51.68 1.10
D128 T 55 60.07 1.09
D128 T 77 84.53 1.10
D128 T 96 102.91 1.07
D128 T 106 112.92 1.07