Open Superkeyv opened 1 week ago
we can split batch's sequence-length before broadcast in tp_group, which can save time in get_batch
we can split batch's sequence-length before broadcast in tp_group, which can save time in get_batch