erpc-io / eRPC

Efficient RPCs for datacenter networks
Other
851 stars 138 forks source link

Route resolution failure on Client + Segfault on Mellanox NIC #48

Open vsag96 opened 4 years ago

vsag96 commented 4 years ago

Hi,

OFED version 5.0.2 NIC Mellanox Connect X-5 OS Ubuntu 18 A tofino Switch does simple packet forwarding for us.

If I try with -DTransport=infiband and -DROCE=on I am able to build successfully and when I use the hello world app, on the client side I get the following error

Received connect response from [H: 192.168.1.4:31850, R: 0, S: XX] for session 0. Issue: Error [Routing resolution failure]

The server is receiving the initial connect packet from the client and then client segfaults and server prints the below statement in a loop. The error on the server is as follows.

Received connect request from [H: 192.168.1.3:31850, R: 0, S: 0]. Issue: Unable to resolve routing info [LID: 0, QPN: 449, GID interface ID 16601820732604482458, GID subnet prefix 33022]. Sending response.

In the README, you mentioned to use -Dtransport=raw for Mellanox NIC's. I was not able to build with that flag. Error Trace We want to use eRPC over ROCEv2 + DCQCN. We are okay with IB, unless you tell us otherwise. The RDMA devices are on on rdma link and ibdev2netdev.

anujkaliaiitd commented 4 years ago

Hi. Thanks for reporting this issue.

Can you comfirm if ib_read_bw is working over RoCE?

vsag96 commented 4 years ago

On the server I started with ib_read_bw and on the client with ib_read_bw with . Going by the output it, I believe it works. Nonetheless attaching the trace, as I am just starting with running RDMA applications.

The server side trace

                RDMA_Read BW Test

Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON CQ Moderation : 1 Mtu : 1024[B] Link type : Ethernet GID index : 3 Outstand reads : 16 rdma_cm QPs : OFF Data ex. method : Ethernet

local address: LID 0000 QPN 0x02c9 PSN 0x2df5d7 OUT 0x10 RKey 0x003572 VAddr 0x007f65b20e9000 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:04 remote address: LID 0000 QPN 0x0239 PSN 0x2d5540 OUT 0x10 RKey 0x003582 VAddr 0x007f1a3666e000 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:03

bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

65536 1000 10222.53 6831.89 0.109310

On the client.

                RDMA_Read BW Test

Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON TX depth : 128 CQ Moderation : 1 Mtu : 1024[B] Link type : Ethernet GID index : 3 Outstand reads : 16 rdma_cm QPs : OFF Data ex. method : Ethernet

local address: LID 0000 QPN 0x0239 PSN 0x2d5540 OUT 0x10 RKey 0x003582 VAddr 0x007f1a3666e000 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:03 remote address: LID 0000 QPN 0x02c9 PSN 0x2df5d7 OUT 0x10 RKey 0x003572 VAddr 0x007f65b20e9000 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:04

bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

Conflicting CPU frequency values detected: 1178.116000 != 800.186000. CPU Frequency is not max. 65536 1000 10222.53 6831.89 0.109310

anujkaliaiitd commented 4 years ago

Thanks for the details.

Could you please test if eRPC works on your setup with older Mellanox drivers (e.g., Mellanox OFED 4.4)? There have been a lot of recent NIC driver changes and I've not kept the code up to date.

I am aware that eRPC doesn't build anymore with the Raw transport with new Mellanox OFED versions (or rdma_core) because the ibverbs API has changed. I plan to fix this eventually but I'm not sure when I'll have the time.

Dicridon commented 2 years ago

Hi Dr. Kalia I encountered the same issue. I have one 4-node cluster and one 2-node cluster. The former is equipped with CX5 NIC and the latter CX4 NIC. eRPC runs well within each cluster but not between them.

I use the 4-node cluster as servers 2-node cluster as clients. On the client side I have

96:964338 WARNG: Rpc 0: Received connect response from [H: 10.0.0.40:31851, R: 0, S: XX] for session 0. Issue: Error [Routing resolution failure].

First I thought it was due to invalid LIDs (ibv_devinfo shows all ports' LIDs are 0, which is invalid), but since eRPC worked within each cluster, maybe the 0 LIDs were just fine. Then I checked eRPC's source code and noticed that eRPC seemed not to be able to successfully create AH in IBTransport::create_ah. So I thought maybe the two clusters couldn't communicate using UD, but ib_send_bw -c UD and ib_read_bw both worked.

Could you give any advice for further troubleshooting?

anujkaliaiitd commented 2 years ago

Hi! The verbs address handle creation process is a bit complex so it's likely I missed something in my implementation of create_ah. The implementation is different for RoCE and InfiniBand (see https://github.com/erpc-io/eRPC/blob/75e3015d17fa4693427487dbc783dc01249c36df/src/transport_impl/infiniband/ib_transport.cc#L74), so I assume you're passing -DROCE=on if you're using RoCE.

My suggestion to fix this would be to see how the perftest package implements address handle resolution, and use that information to try fixing eRPC's create_ah.

Dicridon commented 2 years ago

Hi Dr. Kalia Sorry for my late reply because I had a holiday and spent some time finding the create_ah issue.

Thanks to your precise analysis, I am able to find that the resolution failure error is caused by unmatched GIDs. In file https://github.com/erpc-io/eRPC/blob/d35a86dcf92757b77ff187f15f7bf67a4ebc0221/src/transport_impl/infiniband/ib_transport.cc#L18 , the kDefaultGIDIndex works for the most of time, but unluckily, my two clusters have different NIC configurations. Thus the default value picks the wrong GID in one cluster when RoCE is enabled, thus the two clusters fail to communicate.

The reason ib_send_bw -c UD works is that it requires users to offer a GID index and device ID, thus it can always get the correct GID. I guess maybe it is also a good idea for eRPC to require users to offer an optional valid GID index?