Closed xrennvidia closed 1 month ago
This is a CP implementation variant with KV all-gather. Currently, it can support:
Will add more functionality support later.
The KV all-gather communication is exposed, but the overheads should be small with GQA/MQA.
Please list the changes introduced in this PR:
/te-ci
/te-ci pytorch
Description
This is a CP implementation variant with KV all-gather. Currently, it can support:
Will add more functionality support later.
The KV all-gather communication is exposed, but the overheads should be small with GQA/MQA.
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: