[FA-qk-fp8] Add fp8 FA to 06-fused-attention-fwd-transV.py

ROCm / triton

Development repository for the Triton language and compiler

MIT License

80 stars 23 forks source link

Closed zhanglx13 closed 5 months ago

zhanglx13 commented 6 months ago

This PR adds the FA qkfp8 kernel, in which only the first gemm is done in fp8.

To make things simpler, it is assumed that the torch has native support for AMD fp8 data types.