Closed jtang10 closed 4 months ago
Could you also make similar changes to flash-attention.py
?
Could you also make similar changes to
flash-attention.py
?
Do you want me to do it here, or can I fold that with the fp8 in flash-attention.py
. Though bf16 and fp8 are separate efforts, I personally prefer to consolidate them in flash-attention.py
for less overhead, if that is okay.
06-fused-attention-fwd-transV.py
:test_op_fwd
has 2 testcases failing, each of which has two elements in the final output right outside the error threshold.06-fused-attention-transV.py
: