Open slavonnet opened 1 month ago
There were other people asking for RDMA support in a recent discussion as well. I don't have such hardware but it's nice to see there is software emulation.
I will try to spend some cycles on this in the near term. Patches are also welcome.
Patches are also welcome
I can't write a patch from scratch, but I may fix bugs in the future. Unfortunately, there is no free time
I will be able to check the inference on the CPU on 3 servers (512 GB of memory on each + Mellanox Connectx 3 + infiniband switch)
And next year I plan to have several servers in each of 3x4060 ti
This issue was closed because it has been inactive for 14 days since being marked as stale.
@rgerganov Please reopen. Bot was auto close issue
Prerequisites
Feature Description
The network stack has delays and a small frame size. If you apply RDMA, you can achieve the speed of hundreds of backends, as if running on a single server
Motivation
It would be good to be able to synchronize the execution results in layers between backends via RDMA to reduce delays
Possible Implementation
If you do not have support or hardware with RDMA, you can use the RXE kernel module for emulation
https://enterprise-support.nvidia.com/s/article/howto-configure-soft-roce