In my H100 platform, each H100 GPU is configured with a BlueField 3 superNIC.
I found that DOCA GPUNetIO combines technologies like GPUDirect RDMA and GPUDirect Async to enable the creation of GPU-centric applications where a CUDA kernel can directly communicate with the network interface card (NIC) for sending and receiving packets, bypassing the CPU and excluding it from the critical path. This means NCCL does not need the CPU proxy thread to communicate with the NIC.
So my question is, does NCCL support DOCA GPUNetIO?
In my H100 platform, each H100 GPU is configured with a BlueField 3 superNIC. I found that DOCA GPUNetIO combines technologies like GPUDirect RDMA and GPUDirect Async to enable the creation of GPU-centric applications where a CUDA kernel can directly communicate with the network interface card (NIC) for sending and receiving packets, bypassing the CPU and excluding it from the critical path. This means NCCL does not need the CPU proxy thread to communicate with the NIC.
So my question is, does NCCL support DOCA GPUNetIO?
https://developer.nvidia.com/blog/unlocking-gpu-accelerated-rdma-with-nvidia-doca-gpunetio/ https://docs.nvidia.com/doca/sdk/doca+gpunetio/index.html