[MFMA] Move operand casts to AccelerateMatMul pass

ROCm / triton

Development repository for the Triton language and compiler

MIT License

80 stars 23 forks source link

[MFMA] Move operand casts to AccelerateMatMul pass #477

Closed binarman closed 5 months ago

binarman commented 6 months ago

This PR moves casts of operands from python code and ttg to llvm phase to AccelerateAMDMatmul pass

alefimov-amd commented 6 months ago

Converted to draft, because first want to merge mfma4x64 support, and tests generating scripts (for accelerate matmul lit tests)

zhanglx13 commented 5 months ago

@alefimov-amd @scxiao I'd suggest we close this one. The type promotion is upstream'ed and we don't need it in terms of perf.

alefimov-amd commented 5 months ago

@zhanglx13 I want to update this PR, when upstream is stable(at least one more PR: https://github.com/openai/triton/pull/3025), to keep upstream and triton-mlir compatible.

Eventually, we will need to transfer all the stuff we have in triton-mlir, if upstream and triton-mlir are compatible, we can simply copy code from triton-mlir to upstream.

zhanglx13 commented 5 months ago

@alefimov-amd Ok, I'm also fine with that

binarman commented 5 months ago

This PR did 3 things:

1, Move C/D operand casts from python to C++ code

Remove redundant MFMA support checks (we do not need them, since we have new instruction selection)
Add casts of A/B operands in case of mixed dtype inputs

Now this PR is doing only 1 item. 2 item is moved to #496 3 item is temporary removed, since mixed precision is not supported in tt.dot operation(MLIR verification fails), I want to discuss this with upstream first