NCCL(Nickel) provides collective operations on NVIDIA GPUs similar to MPI on a single node level such as all-reduce, all-gather, reduce-scatter, reduce, and broadcast.
It allows for host side control of collective operations between GPU memories and the developers claim to achieve high bandwidth over PCIe. The combination of an internode communication policy, NCCL and GPU memory copy can provide P2P and collective operations between all GPU memories of a HPC or distributed system.
NCCL(Nickel) provides collective operations on NVIDIA GPUs similar to MPI on a single node level such as all-reduce, all-gather, reduce-scatter, reduce, and broadcast.
It allows for host side control of collective operations between GPU memories and the developers claim to achieve high bandwidth over PCIe. The combination of an internode communication policy, NCCL and GPU memory copy can provide P2P and collective operations between all GPU memories of a HPC or distributed system.