erpc-io / eRPC

Efficient RPCs for datacenter networks
https://erpc.io/
Other
840 stars 138 forks source link

app/small_rpc_tput failed with modified libmlx5 driver #4

Closed lastweek closed 5 years ago

lastweek commented 6 years ago

Hi Anuj,

I was trying the modified mlx5 driver. I'm running OFED 4.4. While linked with this modified driver, I got: eRPC: Fatal error. Bad wc status 19.

Compared with driver/4.2/libmlx4, I found you missed some code in both driver/4.4/libmlx5 and driver/4.2/libmlx5. My question is: have you fully tested the mlx5 modified driver? Or should I start look into mlx4 one to port anything that is missing?

anujkaliaiitd commented 6 years ago

Hi Yizhou. Thanks for trying out the code.

The modified driver isn't fully tested, although it works for all of our benchmarks. It gets rid of some unneeded code branches for a small performance boost (~10% for small RPC rate). You can see the deletions by diffing drivers/4.4/libmlx5-41mlnx1 and drivers/4.4/orig/libmlx5-41mlnx1 (the latter is the original driver code that ships with Mellanox OFED).

Can you share more details about your setup? Which NIC are you using, and what test fails? Does it work with the original driver? Also, it's easiest for me if you can reproduce the bug on CloudLab.

lastweek commented 6 years ago

Hi Anuj,

Thank you for your reply. My testing platform has one ConnectX-4, which is using mlx5 driver. And I'm trying to run apps/small_rpc_tput, which works with original OFED drivers and it has reasonable performance numbers.

Then I wanna try the modified driver about the perf optimizations. I tried two methods to use the driver: 1) Use LD_PRELOAD, 2) replace usr/lib/libmlx5... I'm pretty sure the modified driver is loaded.

The reason I ask is I found the diff of mlx5 against its original one, is somewhat less then the diff of mlx4. For example, during testing, I added following missing code to mlx5_post_recv() to pass the driver checking:

        if (wr == NULL && (*bad_wr) != NULL &&
            (*bad_wr)->wr_id == ERPC_MODDED_PROBE_WRID) {
                /* Tell the caller that this is a modded driver */
                return ERPC_MODDED_PROBE_RET;
        } 

If the modified driver works for you, I guess its some other reasons. I'm using linux-4.4.98. What version are you using during test?

Sorry I do not have access to CloudLab, and I have to use my current platform.

anujkaliaiitd commented 6 years ago

I tried out the current version with ConnectX-4 Lx cards (Ethernet), and it seems to work. I'm using Ubuntu 18.04, so kernel 4.15.

Are you using InfiniBand by any chance? I suspect this because eRPC's Ethernet transport currently does not probe for a modded driver (it should!) since the original driver is sufficient for fast RECVs because of multi-packet RQs. The optimized libmlx5 driver is for Ethernet only. Supporting InfiniBand will require a few small changes.

lastweek commented 6 years ago

Hi Anuj,

I'm using ConnextX-4 (Ethernet). This is my device info:

hca_id: mlx5_1
        transport:                      InfiniBand (0)
        ...
        Device ports:
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

d8:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
        Subsystem: Mellanox Technologies MT27700 Family [ConnectX-4]

And the Makefile is generated by:

cmake . -DPERF=ON -DTRANSPORT=raw

Correct me if I'm doing anything wrong above... If this is correct, I suspect if the kernel version matters, cause the user level driver and kernel driver may have a mismatch (damn those "well documented" mlx driver, I had a lot issues on this version thing before).

Anyway. What do you mean by the original driver is sufficient for fast RECVs because of multi-packet RQs.? Do you mean that, given the hardware I have, original driver can deliver "almost" the same performance as the modified driver?

anujkaliaiitd commented 6 years ago

Mellanox OFED updates kernel drivers and device firmware, so it's possible that is not the issue here. One probable difference is that I've only tried ConnectX-4 "Lx", which is different from ConnectX-4.

For mlx5-based Ethernet, the performance difference between the modified and original driver is around 10% for single-core small RPC rate, so it doesn't matter much. The difference is likely even lower with multiple cores.

In case you want to dig into this, there is an easy way to pinpoint the bug since the original driver works. You can keep reducing the diff, and see where the break happens. For example, you can revert cq.c and doorbell.h to see if the problem is in qp.c. If that's the case, revert code sections in qp.c. The edits to these three files are independent.

lastweek commented 6 years ago

Cool. I guess there is no way other than reducing the diff. I will let you know how this goes. Thanks.

lastweek commented 6 years ago

Hi Anuj,

I followed your suggestion, and tried reverting. Unfortunately, none of them works.

By revert cq.c and doorbell.h, we will get this:

mlx5: sc2-hs2-b1624: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 8d006801 0a000902 0000c8d2
eRPC: Fatal error. Bad wc status 1.

By revert qp.c, kernel mlx5_core stuck! I had to reboot machine.

I guess it's due to 1) NIC difference, 2) OFED-driver, libmlx5 version difference. Since I'm no expert in IB internal, I won't continue debug this. Thank you for the help along the way.

anujkaliaiitd commented 6 years ago

Thanks for trying. The optimized drivers are a pain point and I want to get rid of them eventually. I will give this a shot if I get access to ConnectX-4 NICs.