erpc-io / eRPC

Efficient RPCs for datacenter networks
https://erpc.io/
Other
835 stars 137 forks source link

Can't select from multiple net devices when using Infiniband #93

Closed Stuart0l closed 1 year ago

Stuart0l commented 1 year ago

Problem

If Infiniband is chosen as the transport and multiple NICs are available, there is no way to explicitly specify which NIC to use.

Expected Behavior

I wish to be able to specify which network device I want to use

Detail

According to the code here in verbs_common.h/void common_resolve_phy_port(), the first device that has the specified phy_port is chosen: https://github.com/erpc-io/eRPC/blob/1ef5d5f3b095776bba4ed21a31cc804211136f16/src/transport_impl/verbs_common.h#L129

However, it's possible a machine has multiple NIC available. For example, Cloudlab xl170 instance has 4 devices (2 active)

$ ibv_devinfo
hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         14.18.2030
        node_guid:                      98f2:b3ff:ffca:6090
        sys_image_guid:                 98f2:b3ff:ffca:6090
        vendor_id:                      0x02c9
        vendor_part_id:                 4117
        hw_ver:                         0x0
        board_id:                       HP_2690110034
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

hca_id: mlx5_1
        transport:                      InfiniBand (0)
        fw_ver:                         14.18.2030
        node_guid:                      98f2:b3ff:ffca:6091
        sys_image_guid:                 98f2:b3ff:ffca:6090
        vendor_id:                      0x02c9
        vendor_part_id:                 4117
        hw_ver:                         0x0
        board_id:                       HP_2690110034
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

hca_id: mlx5_2
        transport:                      InfiniBand (0)
        fw_ver:                         14.18.2030
        node_guid:                      9cdc:71ff:ff5d:d570
        sys_image_guid:                 9cdc:71ff:ff5d:d570
        vendor_id:                      0x02c9
        vendor_part_id:                 4117
        hw_ver:                         0x0
        board_id:                       HP_2420110034
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

hca_id: mlx5_3
        transport:                      InfiniBand (0)
        fw_ver:                         14.18.2030
        node_guid:                      9cdc:71ff:ff5d:d571
        sys_image_guid:                 9cdc:71ff:ff5d:d570
        vendor_id:                      0x02c9
        vendor_part_id:                 4117
        hw_ver:                         0x0
        board_id:                       HP_2420110034
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

Thus, the first device mlx5_0 will be chosen. However, I would like to choose the fourth device mlx5_3 since mlx5_0 is connected to public network while mlx5_3 is connected to internal network:

DEV     PORT    INDEX   GID                                     IPv4            VER     DEV
---     ----    -----   ---                                     ------------    ---     ---
mlx5_0  1       0       fe80:0000:0000:0000:9af2:b3ff:feca:6090                 v1      enp7s0f0
mlx5_0  1       1       fe80:0000:0000:0000:9af2:b3ff:feca:6090                 v2      enp7s0f0
mlx5_0  1       2       0000:0000:0000:0000:0000:ffff:806e:dae5 128.110.218.229         v1      enp7s0f0
mlx5_0  1       3       0000:0000:0000:0000:0000:ffff:806e:dae5 128.110.218.229         v2      enp7s0f0
mlx5_1  1       0       fe80:0000:0000:0000:9af2:b3ff:feca:6091                 v1      enp7s0f1
mlx5_1  1       1       fe80:0000:0000:0000:9af2:b3ff:feca:6091                 v2      enp7s0f1
mlx5_2  1       0       fe80:0000:0000:0000:9edc:71ff:fe5d:d570                 v1      ens1f0
mlx5_2  1       1       fe80:0000:0000:0000:9edc:71ff:fe5d:d570                 v2      ens1f0
mlx5_3  1       0       fe80:0000:0000:0000:9edc:71ff:fe5d:d571                 v1      ens1f1
mlx5_3  1       1       fe80:0000:0000:0000:9edc:71ff:fe5d:d571                 v2      ens1f1
mlx5_3  1       2       0000:0000:0000:0000:0000:ffff:0a0a:0101 10.10.1.1       v1      ens1f1
mlx5_3  1       3       0000:0000:0000:0000:0000:ffff:0a0a:0101 10.10.1.1       v2      ens1f1

I need to use the one connected to internal network since 1) the public network only has 10Gb bw while the internal one has 25Gb bw. 2) Cloudlab doesn't allow me to put heavy traffic on public network.

Thanks for your help!

anujkaliaiitd commented 1 year ago

Hi. The example apps have a ports parameter that can be used for this, e.g., https://github.com/erpc-io/eRPC/blob/1ef5d5f3b095776bba4ed21a31cc804211136f16/apps/masstree_analytics/config#L11

For InfiniBand, phy_port should be the zero-based index of the NIC port as listed by ibv_devinfo(). I'm forgetting if the eRPC code counts only ACTIVE ports or all ports.

For mlx5_3, phy_port should be either 1 or 3 -- you can try both.

Stuart0l commented 1 year ago

Hi. The example apps have a ports parameter that can be used for this, e.g.,

https://github.com/erpc-io/eRPC/blob/1ef5d5f3b095776bba4ed21a31cc804211136f16/apps/masstree_analytics/config#L11

For InfiniBand, phy_port should be the zero-based index of the NIC port as listed by ibv_devinfo(). I'm forgetting if the eRPC code counts only ACTIVE ports or all ports.

For mlx5_3, phy_port should be either 1 or 3 -- you can try both.

Hi Anuj,

Thanks for your response. However, the phy_port parameter can't specify which device to use, it can only specify which port on that device to use. A server can have multiple devices, and a device can have multiple ports, as shown here (although this device only has 1 port):

DEV     PORT    INDEX   GID                                     IPv4            VER     DEV
---     ----    -----   ---                                     ------------    ---     ---
mlx5_0  1       0       fe80:0000:0000:0000:9af2:b3ff:feca:6090                 v1      enp7s0f0

The VerbsResolve has both dev_id and dev_port_id fields, the phy_port param specifies the latter one (yes it only counts active ports) but not the former one. https://github.com/erpc-io/eRPC/blob/1ef5d5f3b095776bba4ed21a31cc804211136f16/src/transport_impl/verbs_common.h#L184-L186

I've tried setting phy_port to be 1 or 3, but it doesn't work, since there is no device on the server that has a 2nd or a 4th port.

anujkaliaiitd commented 1 year ago

Hm, perhaps something broke, but there's a loop in the code over all devices. In the past, I've tested on servers with multiple multi-port NICs.

phy_port is supposed to be a global index across all devices and ports. So if there are two devices with two ports each, setting phy_port = 2 should pick the first port on the second device.

https://github.com/erpc-io/eRPC/blob/1ef5d5f3b095776bba4ed21a31cc804211136f16/src/transport_impl/verbs_common.h#L140

which counts up to phy_port:

https://github.com/erpc-io/eRPC/blob/1ef5d5f3b095776bba4ed21a31cc804211136f16/src/transport_impl/verbs_common.h#L165

Stuart0l commented 1 year ago

Hm, perhaps something broke, but there's a loop in the code over all devices. In the past, I've tested on servers with multiple multi-port NICs.

phy_port is supposed to be a global index across all devices and ports. So if there are two devices with two ports each, setting phy_port = 2 should pick the first port on the second device.

https://github.com/erpc-io/eRPC/blob/1ef5d5f3b095776bba4ed21a31cc804211136f16/src/transport_impl/verbs_common.h#L140

which counts up to phy_port:

https://github.com/erpc-io/eRPC/blob/1ef5d5f3b095776bba4ed21a31cc804211136f16/src/transport_impl/verbs_common.h#L165

Hi Anuj,

I think misunderstand the logic before and now I get it correct. I tested using phy_port=1 and it works in my case. Thank you so much for your explanation!