Open MoFHeka opened 2 months ago
NCCL does not use UCX by default. But I'm not sure what you're asking about.
There is a suite of NCCL benchmarks at https://github.com/NVIDIA/nccl-tests
I mean, what is the difference between the default P2P communication efficiency of NCCL and UCX, and in what scenarios will it be better?
Including PCI-E, RDMA, TCP/IP and other scenarios, I do not know what kind of test is appropriate.