ROCm / triton

Development repository for the Triton language and compiler
MIT License
80 stars 23 forks source link

[FA-qk-fp8] Add fp8 FA to 06-fused-attention-fwd-transV.py #475

Closed zhanglx13 closed 5 months ago

zhanglx13 commented 6 months ago

This PR adds the FA qkfp8 kernel, in which only the first gemm is done in fp8.

To make things simpler, it is assumed that the torch has native support for AMD fp8 data types.

And also remove ( ) in gemm thanks to fix in https://github.com/ROCmSoftwarePlatform/triton/pull/445