Closed zhanglx13 closed 5 months ago
This PR adds the FA qkfp8 kernel, in which only the first gemm is done in fp8.
To make things simpler, it is assumed that the torch has native support for AMD fp8 data types.
And also remove ( ) in gemm thanks to fix in https://github.com/ROCmSoftwarePlatform/triton/pull/445
This PR adds the FA qkfp8 kernel, in which only the first gemm is done in fp8.
To make things simpler, it is assumed that the torch has native support for AMD fp8 data types.
And also remove ( ) in gemm thanks to fix in https://github.com/ROCmSoftwarePlatform/triton/pull/445