Closed jataylo closed 1 year ago
I was not able to reproduce this particular error on mi100 and mi210 and current ToT of triton, could you give more details on your environment:
P.s. I've tried to change int8
to uint8
and got different error:
loc("./fail.py":135:21): error: 'llvm.uitofp' op result #0 must be floating point LLVM type or LLVM dialect-compatible vector of floating point LLVM type, but got 'i16'
Pass execution failedLLVM ERROR: Failed to translate TritonGPU to LLVM IR.
Aborted (core dumped)
@binarman Thank you for taking a look at this, just confirmed that this reproducer passes with TOT triton, I had been using Aug 29th commit of triton from our pytorch 2.1 branch.
Once we merge: https://github.com/ROCmSoftwarePlatform/triton/pull/354 and https://github.com/ROCmSoftwarePlatform/triton/pull/296 I will bump forward our triton pin to resolve this.
@jayfurmanek @zhanglx13 @binarman cc: @dllehr-amd @jithunnair-amd
A few PyTorch unit tests are failing with the message
After investigation I have narrowed this down to any torch workload while enables triton gemms for matmul using bfloat16 tensors.
Here is a triton reproducer which has a passing fp16 matmul and failing bf16 matmul.
Reproducer:
For additional context PyTorch's triton matmul codegen is here https://github.com/pytorch/pytorch/blob/main/torch/_inductor/kernel/mm.py#L31