SoftRoCE / librxe-dev

Development repository for RXE user space code.
Other
66 stars 33 forks source link

rxe_cfg network namespace support #12

Open ziyin-dl opened 6 years ago

ziyin-dl commented 6 years ago

I recently had an issue with adding an interface in a separate network namespace. The setting is as follows: I have 2 servers, running Ubuntu 16.04 with kernel version 4.14. Each server has 4 network interfaces, all with intel 1000base-T NICs (I211). I am running soft-RoCE on both machines. Installed the user space libraries as instructed. Furthermore, I verified that soft-RoCE works by running ibv_rc_pingpong between the 2 servers.

However, what I would like to do is to create a couple of containers on the servers, and each container has a separate interface, with soft-RoCE running on top of every interfaces. I want the containers to have separated interfaces to stop Linux internal routing and for load balancing purposes. To do this, I created network namespaces for each container, and moved the physical interfaces into their corresponding network namespace. When I tried adding the NICs using rxe_cfg add, it throws an error saying

$ sudo ip netns exec $(PID) rxe_cfg add p3p1 [ 3100.844015] rdma_rxe: interface p3p1 not found sh: echo: I/O error

(p3p1 is the interface I moved to PID's namespace)

However, it seems that rxe_cfg status can correctly identify the device in the namespace: $ sudo ip netns exec $(PID) rxe_cfg status Name Link Driver Speed NMTU IPv4_addr RDEV RMTU p3p1 yes igb 1500 192.168.10.1

Does soft-RoCE work in this setting, or I missed something in the setup? If it is doable, what is the right way to have separated soft-RoCE devices for different namespaces?

G3orge26 commented 4 years ago

Hi, I understand this is issue has been left hanging for quite some time, but I am having the same issue and was wondering if someone found a solution or workaround to this. Thank you very much in advance.

qmaldon commented 3 years ago

Hi, 2021. Is there any update or workaround on this issue. I'm interested in running Soft-RoCE on a virtual network of docker containers. Thanks

ziyin-dl commented 3 years ago

I eventually got it working but that was back in 2017, so I could not remember the details at this moment. Basically the issue is the soft-RoCE code only searches for devices in the default network namespace. As a result devices in a different network namespace (e.g. in a container) cannot be discovered. You have to change the code so that it looks for the device in the right network namespace.

Also there are some code re. IPv4/v6 routing has to be changed too. This is also caused by network namespace issue (each NS has its own routing table).

Once these two are fixed the code should be good to go.

qmaldon commented 3 years ago

I eventually got it working but that was back in 2017, so I could not remember the details at this moment. Basically the issue is the soft-RoCE code only searches for devices in the default network namespace. As a result devices in a different network namespace (e.g. in a container) cannot be discovered. You have to change the code so that it looks for the device in the right network namespace.

Also there are some code re. IPv4/v6 routing has to be changed too. This is also caused by network namespace issue (each NS has its own routing table).

Once these two are fixed the code should be good to go.

I managed to get it partially working with only 1 Soft RoCE device shared with multiple virtual net interfaces. I'm using some docker containers, and I'm creating some bridge networks. So each container has its own virtual interface.

I can see it working, but all requests and responses go through the same Soft RoCE device.

If I test connectivity from rx0 to rx1 it fails, even from the main host (no container).