Open jianzi123 opened 1 week ago
Like this graph, NET/1 and GPU(0) in the same numa, NET/0 and GPU(1) in the same numa, all under the same rc, why does nccl recognize PHB instead of NODE
Could you include more of NCCL's debug output, and maybe also the topo file (generated using NCCL_TOPO_DUMP_FILE)? What does nvidia-smi topo -m say?
NCCL_TOPO_DUMP_FILE
nvidia-smi topo -m
Like this graph, NET/1 and GPU(0) in the same numa, NET/0 and GPU(1) in the same numa, all under the same rc, why does nccl recognize PHB instead of NODE