Mellanox / libxlio

Other
31 stars 18 forks source link

No traffic sent on Bluefield-2 #135

Open mpodles opened 2 months ago

mpodles commented 2 months ago

Subject

Dear Team,

I'm trying to benchmark NVMe-oF over TCP/IP with and without XLIO. I'm able to get iperf and spdk_perf working between machines but when XLIO is used, no traffic is coming out of the initiator. It's not visible in either tcpdump, ethtool stats, switch that's in-between the machines or the target machine.

An example packet, that the spdk_perf tries to send is ARP:

0 qp_mgr_eth_mlx5::fill_wqe (this=0xaaaaaada6010, pswr=) at dev/qp_mgr_eth_mlx5.cpp:485

1 0x0000fffff7024db0 in qp_mgr_eth_mlx5::send_to_wire (this=0xaaaaaada6010, p_send_wqe=, attr=, request_comp=, tis=, credits=) at dev/qp_mgr_eth_mlx5.cpp:748

2 0x0000fffff7020ac8 in qp_mgr::send (this=0xaaaaaada6010, p_send_wqe=p_send_wqe@entry=0xaaaaaada4510, attr=attr@entry=0, tis=tis@entry=0x0, credits=credits@entry=2) at dev/qp_mgr.cpp:611

3 0x0000fffff704d9b0 in ring_simple::send_buffer (tis=0x0, attr=, p_send_wqe=0xaaaaaada4510, this=0xaaaaaada4920) at dev/ring_simple.cpp:746

4 ring_simple::send_ring_buffer (this=0xaaaaaada4920, id=, p_send_wqe=0xaaaaaada4510, attr=) at dev/ring_simple.cpp:776

5 0x0000fffff7077794 in neigh_eth::send_arp_request (this=0xaaaaaada4370, is_broadcast=) at proto/neighbour.cpp:1661

6 0x0000fffff70724a4 in neigh_entry::send_discovery_request (this=0xaaaaaada4370) at proto/neighbour.cpp:393

After this, it successfully gets completion in:

0 cq_mgr_mlx5::poll_and_process_element_tx (this=0xaaaaaada63d0, p_cq_poll_sn=0xffffffffd680) at dev/cq_mgr_mlx5.cpp:542

1 0x0000fffff7020a68 in qp_mgr::send (this=0xaaaaaada60d0, p_send_wqe=p_send_wqe@entry=0xaaaaaada4700, attr=attr@entry=0, tis=tis@entry=0x0, credits=credits@entry=2) at dev/qp_mgr.cpp:605

2 0x0000fffff704d9b0 in ring_simple::send_buffer (tis=0x0, attr=, p_send_wqe=0xaaaaaada4700, this=0xaaaaaada4b10) at dev/ring_simple.cpp:746

3 ring_simple::send_ring_buffer (this=0xaaaaaada4b10, id=, p_send_wqe=0xaaaaaada4700, attr=) at dev/ring_simple.cpp:776

4 0x0000fffff7077794 in neigh_eth::send_arp_request (this=0xaaaaaada4560, is_broadcast=) at proto/neighbour.cpp:1661

5 0x0000fffff70724a4 in neigh_entry::send_discovery_request (this=0xaaaaaada4560) at proto/neighbour.cpp:393

I've checked the device It's using for the ARP and it looks correct - p1 (it's the name of physical function interface on Bluefield). Thanks in advance for any help.

Cheers

Issue type

Configuration:

Actual behavior:

No traffic coming out of the network interface even though WQ is posted and CQ is received.

Expected behavior:

SPDK perf or iperf are able to connect and send traffic

Steps to reproduce:

sudo LD_PRELOAD=/opt/mellanox/libxlio/lib/libxlio.so iperf -t 30 -c 20.20.20.4 -m -P 1 -i 1 -M 1500 or sudo SPDK_XLIO_PATH=/opt/mellanox/libxlio/lib/libxlio.so XLIO_TRACELEVEL=DEBUG ~/spdk-23.01/build/examples/perf -q 64 -o $((2**12)) -w randread -r 'trtype:nvda_tcp adrfam:IPv4 traddr:20.20.20.4 trsvcid:4420' -t 300 -c 0x01 --transport-stats -G --default-sock-impl xlio

iftahl commented 1 month ago

@mpodles, I assume you work within ARM OS, and not x86 host, is that correct?

If so, please add the outputs for the following: cat /etc/mlnx-release sudo flint -d 03:00.0 q ip a sudo ovs-vsctl show sudo ibdev2netdev -v sudo mlxconfig -d 03:00.0 -e q

mpodles commented 1 month ago

Dear @iftahl

Thanks for support but we've manged to debug the issue, which was using PF instead of SF on the Bluefield ARM SoC. I've assumed that XLIO being userspace stack could leverage the p1 (PF) in the same way that DPDK can, without OvS being in the way but it appears that software netdev representor and OvS is required.

Thanks for getting back to me and I believe this can be closed.