Open hxdtest opened 6 months ago
That is because we currently only support AllGather/ReduceScatter overlapping with GEMM (and those communication types are used when sequence parallelism is enabled, as opposed to AllReduce which is being used in the other cases).
In Megatron, I find that the check for
tp_comm_overlap
andsequence_parallel
。But why?