WMMA instructions are not supported for GEMM

ROCm / triton

Development repository for the Triton language and compiler

MIT License

83 stars 27 forks source link

Closed joviliast closed 3 months ago

joviliast commented 8 months ago

Here is highlevel tasks for WMMA enabling for GEMM:

[x] Implement BlockedToWMMA in TritonAMDGPUAccelerateMatmulPass according to required layout (https://github.com/ROCmSoftwarePlatform/triton/pull/423)
[x] Support WMMA in ConvertTritonGPUToLLVMPass:
- [x] Implement conversion from WMMA layout
- [x] Implement SharedToDotOperandWMMA
- [x] Implement packMfmaOperand for ElementwiseOpToLLVM
[x] Support WMMA in TritonGPURemoveLayoutConversionsPass
[x] Fix bugs and pitfalls
- [x] Fix Flash attention kernel
- [x] Fix swizzling (wip)

joviliast commented 8 months ago

joviliast commented 7 months ago

joviliast commented 7 months ago

Performance results on working branch using WMMA instructions:

Unfortunately, couldn't wait for results without WMMA to compare. fused-attention-tflops.txt

joviliast commented 7 months ago

fused-attention mismatches for now

joviliast commented 6 months ago

All the works are moved to the upstream for now. Please check PRs:

joviliast commented 6 months ago

fused-attention mismatches for now

joviliast commented 5 months ago

All required PRs were merged to the upstream: https://github.com/openai/triton Swizzling is not working for now. Fix in progress,