Closed y1r closed 3 months ago
In the NCCL 2.22 documentation, NCCL_IB_SPLIT_DATA_ON_QPS's default value / behavior is split-mode. However, it seems the default in the implementation is 0 (round-robin) according to https://github.com/NVIDIA/nccl/blob/178b6b759074597777ce13438efb0e0ba625e429/src/transport/net_ib.cc#L1588.
NCCL_IB_SPLIT_DATA_ON_QPS
Could you update the document to match the implementation?
P.S. In my H100x8 / 400G CX-7 x 4 environment, split-mode with multiple QPs is nessesary to achieve almost wire rate.
Yes, we've noticed this discrepancy as well. The documentation is scheduled to be fixed when the next NCCL release comes out. Thank you!
In the NCCL 2.22 documentation,
NCCL_IB_SPLIT_DATA_ON_QPS
's default value / behavior is split-mode. However, it seems the default in the implementation is 0 (round-robin) according to https://github.com/NVIDIA/nccl/blob/178b6b759074597777ce13438efb0e0ba625e429/src/transport/net_ib.cc#L1588.Could you update the document to match the implementation?
P.S. In my H100x8 / 400G CX-7 x 4 environment, split-mode with multiple QPs is nessesary to achieve almost wire rate.