[FA] Add FA tutorial with transV - Githubissues

ROCm / triton

Development repository for the Triton language and compiler

MIT License

86 stars 27 forks source link

[FA] Add FA tutorial with transV #385

Closed zhanglx13 closed 10 months ago

zhanglx13 commented 10 months ago

Make V tensor k-major so that we don't need to suffer from the transposition issue when ds_reading from LDS for V.
This PR adds a new version of fused-attention tutorial so that the original one is not "polluted".

Performance: T: causal=True F: causal=False

	triton-mlir	transV	speedup
D64 F	100	105.1	1.05
D64 F	108	113	1.05
D64 F	107	112.8	1.05
D64 F	108	113.7	1.05
D64 F	109	114.2	1.05
D128 F	93	100	1.08
D128 F	103	112.79	1.10
D128 F	109	119.18	1.09
D128 F	112	122.13	1.09
D128 F	113	124.16	1.10
D64 T	58	65.4	1.13
D64 T	79	87.6	1.11
D64 T	92	100.4	1.09
D64 T	98	106.1	1.08
D64 T	101	109.1	1.08
D128 T	47	51.68	1.10
D128 T	55	60.07	1.09
D128 T	77	84.53	1.10
D128 T	96	102.91	1.07
D128 T	106	112.92	1.07