Open kwen2501 opened 1 week ago
I would think ncclCommSplit mimics the behavior of ncclCommInitRank. So:
Q1: yes, it should Q2: checking the status of newcomm should work, so that you can poll on it then use it for comm operations . But indeed we would also need to wait for the split to complete until we can reuse the parent comm, hence the status of the parent comm should be ncclInProgress as well until it's safe to call new operations on the parent comm.
I might be missing something though; let me check internally.
Thanks @sjeaugey
Re Q1: Yeah, I had the same mental model / expectation. But I also saw some lines as below from PyTorch's log after using split in non-blocking mode:
[PG ID 1 PG GUID 1(default_pg:split:0) Rank 0] ProcessGroupNCCL created ncclComm_ 0 on CUDA device: 0
The "ncclComm_ 0" part seems to suggest that the value is not filled when ncclCommSplit
returns. It may be also due to bugs in PyTorch, I can isolate further.
Re Q2: Good point about the parent comm.
Ok. It looks like the current implementation is focusing on the parent comm and considering ncclCommSplit like a ncclAllReduce in that respect: the operation is not complete while the parent comm status is ncclInProgress and when it becomes ncclSuccess, then the newcomm is filled.
We'll see if we can improve the behavior to continue being consistent with the non-blocking semantics of the parent, but also be more consistent with the ncclCommInitRankConfig semantics.
I'm thinking about making the ncclCommSplit non-blocking only if both the parent comm and the new comm are non-blocking. Also, in non-blocking mode, the newcomm should be valid when we return, like for ncclCommInitRank. That seems to me to follow the constraints of both sides.
Of course we'd need to make the changes to confirm I'm not missing anything.
Sounds good, thanks!
Also, in non-blocking mode, the newcomm should be valid when we return, like for ncclCommInitRank.
nit: in blocking mode.
nit: in blocking mode.
Mmm, no, I meant in non-blocking more. In blocking mode when we return from ncclCommSplit, the new comm should be valid and fully initialized. In non-blocking mode, the new comm we return should be a valid object (as far as I understand, not the case currently), although probably not initialized yet (ncclInProgress).
Does that make sense?
Oh, my bad understanding of "valid", I took it as "initialized". All good!
Question 1: When calling
ncclCommSplit
in non-blocking API mode, would the value (i.e. pointer) fornewcomm
be filled before the API returns?Question 2: if we want to check if the split finishes, should we call
ncclGetAsyncError
on the parentcomm
or on thenewcomm
? Or either one is fine?