Support tt.trans in stream pipeliner

ROCm / triton

Development repository for the Triton language and compiler

MIT License

89 stars 27 forks source link

Support tt.trans in stream pipeliner #501

Closed htyu closed 7 months ago

htyu commented 8 months ago

The change enables stream pipelining on dot operation with transposition involved, in particular tt.dot(Q, tt.trans(K)). The change also fixes a perf issue where incorrect swizzling code was issued based on pre-transpose layout.

htyu commented 7 months ago

Approved ~ @htyu I see you've done a lot of work regarding tt.trans(). Just out of curiosity, why cant you transpose the tensor before feeding it to the FA or gemm block?

Thanks for reviewing the patch! It's a good question. Our specific model runs an attention kernel, one forward and one backward. Transposition is only used in the backward kernel.