Closed Jiaao-Bai closed 1 year ago
Can you remove all tracing/logging, and run with -b 8 -e 4G -f 2
? That would give us a better idea as to what's going on.
after using nccl v2.17.1, the result is ok
after using nccl v2.17.1, the result is ok
Why using nccl v2.17.1, the result is ok? @Jiaao-Bai
hi, i run sendrecv_perf on 2 nodes with a100, the bandwidth is 0.60GB/s, but the ib_write_bw result is 23Gb/s please give me some advice.
env: nccl version: 2.18.1-1 cuda: 11.6 2 servers, each one has 2 * 25 Gbps bonded network card, and 4 a100 gpus
i do some work on ncclGetUniqueId funtion to run nccl without mpi.. please ignore the log and env ( NCCL_COMM_ID_NOT_PEER_0)
command on peer0:
command on peer1:
log peer0.log peer1.log
topology