Open c-oak opened 4 months ago
Thank you for pointing this out. We'll change it for the next release to behave in line with the rest of the NCCL debug output. As to the spammy nature, we're actually thinking of printing it from all ranks as version inconsistencies between ranks can lead to weird bugs (see #1267 for a recent example).
NCCL will always print its version unconditionally to stdout, even if ncclDebugFile is specied code. This leads to pretty significant logspam during the startup of large jobs.
Could we gate this behind a flag? Alternatively, I think it's saner to only print this to ncclDebugFile, since it keeps all of NCCL's output separate from other streams