Closed yzh119 closed 3 weeks ago
This PR also greatly reduce the binary size because we don't need to compile prefill kernels for each gqa group size.
262 is out of sync with main, this PR rebased the code on main branch.
This PR also greatly reduce the binary size because we don't need to compile prefill kernels for each gqa group size.