ROCm / triton

Development repository for the Triton language and compiler
MIT License
83 stars 27 forks source link

WMMA instructions are not supported for GEMM #448

Closed joviliast closed 3 months ago

joviliast commented 8 months ago

Here is highlevel tasks for WMMA enabling for GEMM:

This issue includes https://github.com/ROCmSoftwarePlatform/triton/issues/250.

joviliast commented 8 months ago

Support WMMA in ConvertTritonGPUToLLVMPass is in progress for now. Here are auxiliary PRs: https://github.com/ROCmSoftwarePlatform/triton/pull/435 https://github.com/ROCmSoftwarePlatform/triton/pull/436

working branch: https://github.com/joviliast/triton/tree/wmma-convertions

joviliast commented 7 months ago

WMMA is partially enabled in draft PR: https://github.com/ROCmSoftwarePlatform/triton/pull/481

joviliast commented 7 months ago

Performance results on working branch using WMMA instructions:

matmul-performance-tflops.txt

Unfortunately, couldn't wait for results without WMMA to compare. fused-attention-tflops.txt

joviliast commented 7 months ago

fused-attention mismatches for now

joviliast commented 6 months ago

All the works are moved to the upstream for now. Please check PRs:

https://github.com/openai/triton/pull/3112 Introduce WMMA layout attr https://github.com/openai/triton/pull/3170 Emit Indices for WMMA https://github.com/openai/triton/pull/3171 Support SharedToDotOperandWMMA https://github.com/openai/triton/pull/3199 Convert WMMA dot op to LLVM

joviliast commented 6 months ago

fused-attention mismatches for now

Fused attention is fixed here https://github.com/joviliast/triton/tree/wmma-convertions

joviliast commented 5 months ago

All required PRs were merged to the upstream: https://github.com/openai/triton Swizzling is not working for now. Fix in progress,