Open Yuxin-CV opened 4 months ago
Hi, I suggest we modify the FLOPs calculation in the MFU according to the FlashAttention benchmark script.
Specifically, the current calculation for the casual mask can exceed 100% MFU for seq_len = 16k (189 * 2 / 312 = 1.21), which is inaccurate. The FLOPs for the casual mask setting should be divided by 2 when using FlashAttention.
Marking as stale. No activity in 60 days.
Hi, I suggest we modify the FLOPs calculation in the MFU according to the FlashAttention benchmark script.
Specifically, the current calculation for the casual mask can exceed 100% MFU for seq_len = 16k (189 * 2 / 312 = 1.21), which is inaccurate. The FLOPs for the casual mask setting should be divided by 2 when using FlashAttention.