NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.14k stars 791 forks source link

Add toggle to disable logging of version string #1271

Open c-oak opened 4 months ago

c-oak commented 4 months ago

NCCL will always print its version unconditionally to stdout, even if ncclDebugFile is specied code. This leads to pretty significant logspam during the startup of large jobs.

Could we gate this behind a flag? Alternatively, I think it's saner to only print this to ncclDebugFile, since it keeps all of NCCL's output separate from other streams

kiskra-nvidia commented 4 months ago

Thank you for pointing this out. We'll change it for the next release to behave in line with the rest of the NCCL debug output. As to the spammy nature, we're actually thinking of printing it from all ranks as version inconsistencies between ranks can lead to weird bugs (see #1267 for a recent example).