Open CtfGo opened 4 months ago
Thanks for your quick reply! @sjeaugey, and I have another further questions to be confirmed ^^:
cudaStream_t
? otherwise, is there any rule we can recognize them?2. No, the GPU-side isn't affected. Only the CPU-side, which in the case of send/recv, may include creating connections with other peers, and could therefore lead to hangs.
creating connections with other peers
does this happen before every time launching the nccl send/recv kernel? and it is a blocking CPU-side behavior in default?What nccl APIs will block CPU side in default ? are these APIs that have no param cudaStream_t?
The non-blocking attribute concerns all NCCL calls, for their CPU side. Init/Finalize, of course, are purely CPU based, but even ncclSend or ncclAllreduce may need to establish connections before the GPU kernel is launched and may block. So setting the communicator to non-blocking will tell NCCL to not block on the CPU call and return ncclInProgress if it would block.
I understand now. Thank you very much for your careful explanation !
Hi, all
I find there is an option
blocking
inncclConfig_t
, the official document declare thatThere is also an example show how this option affect the ncclCommInitRankConfig process.
Here are a few of my questions:
non-blocking
mean for?is it any Nccl call on the communicator would no longer block CPU, neither device synchronization?non-blocking communicator
for them?