NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.27k stars 829 forks source link

Documentation: default of NCCL_IB_SPLIT_DATA_ON_QPS is wrong #1401

Closed y1r closed 3 months ago

y1r commented 3 months ago

In the NCCL 2.22 documentation, NCCL_IB_SPLIT_DATA_ON_QPS's default value / behavior is split-mode. However, it seems the default in the implementation is 0 (round-robin) according to https://github.com/NVIDIA/nccl/blob/178b6b759074597777ce13438efb0e0ba625e429/src/transport/net_ib.cc#L1588.

Could you update the document to match the implementation?

P.S. In my H100x8 / 400G CX-7 x 4 environment, split-mode with multiple QPs is nessesary to achieve almost wire rate.

kiskra-nvidia commented 3 months ago

Yes, we've noticed this discrepancy as well. The documentation is scheduled to be fixed when the next NCCL release comes out. Thank you!