Closed tenpercent closed 7 months ago
@tenpercent
ck.Fw.Op
together with its underlying ck-tiled implementation is able to support mqa/gqa, even though the supported input tensors are 4-Dref_attention
in test_mem_eff_attention.py is not able to handle mqa/gqa with 4-D inputs, so that is why ref_mqa_attention
is added test_mqa_forward
is added for explicitly verified those functions Since we added too many scripts for this function, we can just remove them, and I will keep the scripts in private for testing/verification.
Addressing
And also merging
test_mqa_forward
intotest_mqa_decoding
as suggested in