EttusResearch / uhd

The USRP™ Hardware Driver Repository
http://uhd.ettus.com
Other
942 stars 644 forks source link

DPDK initialization freezes if dpdk_corelist does not contain 0 #552

Open sinddennhierschonallenamenvergeben opened 2 years ago

sinddennhierschonallenamenvergeben commented 2 years ago

Issue Description

When using DPDK, initialization freezes if dpdk_corelist does not contain lcore 0. The program does not give any information about the cause of the problem.

Setup Details

The problem occurs both with the UHD version 4.0.0 (using DPDK 18.11), and with the current state from the git-repository (UHD_4.2.0.git-209-gf23ab721) using DPDK 21.11.

I am using a Mellanox Technologies MCX512A-ACAT to which the USRP X310 is connected with two 10G Ethenet links. Other DPDK applications also run on this setup without problems.

Expected Behavior

I expect at least a warning that uhd-dpdk will not work if it does not have lcore 0 available. The cleaner solution is of course to adapt the program to run on any lcore. This simplifies the integration into an existing system significantly.

Actual Behaviour

The program just freezes during initialization without giving any hint to the problem. When dpdk_corelist is set to 0,1,2, the problem disappears. Without DPDK, so if I omit the argument use_dpdk=1, it also works fine.

Steps to reproduce the problem

Using the following uhd.conf file:

[use_dpdk=1]
dpdk_mtu=9000
dpdk_corelist=3,4,5
dpdk_num_mbufs=4095
dpdk_mbuf_cache_size=315

[dpdk_mac=04:3f:72:ac:30:36]
dpdk_lcore = 1
dpdk_ipv4 = 192.168.40.1/24

[dpdk_mac=04:3f:72:ac:30:37]
dpdk_lcore = 2
dpdk_ipv4 = 192.168.30.1/24

Test the connection using the benchmark_rate tool:

UHD_LOG_LEVEL=debug ./benchmark_rate --rx_rate 100e6 --rx_subdev "A:0" --rx_channels 0 --args "mgmt_addr=192.168.30.2,addr=192.168.40.2,use_dpdk=1"

[INFO] [UHD] linux; GNU C++ version 10.2.1 20210110; Boost_107400; UHD_4.2.0.git-209-gf23ab721
[DEBUG] [PREFS] Loaded user config file /root/.config/uhd.conf
EAL: Detected 32 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Invalid NUMA socket, default to 0
EAL: Invalid NUMA socket, default to 0
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:08:00.0 (socket 0)
mlx5_pci: Size 0xFFFF is not power of 2, will be aligned to 0x10000.
EAL: Invalid NUMA socket, default to 0
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:08:00.1 (socket 0)
mlx5_pci: Size 0xFFFF is not power of 2, will be aligned to 0x10000.
EAL: No legacy callbacks, legacy socket not created

Additional Information

I first seeked advise on the mailing list, but after a long time of trial and error I found the issue described here.

alynch-ni commented 2 years ago

Looking at your reproducing case, it looks like the problem is slightly different than you describe. I have been using DPDK versions 18.11, 19.11, and 20.11 all without core 0 assigned to dpdk_corelist without any problems.

The bigger issue is that the cores specified in the dpdk_lcore for each adapter are not on the dpdk_corelist. The dpdk_corelist should include one core for the base DPDK functionality to run on followed by all the cores assigned to the individual adapters in their dpdk_lcore value. You should be able to confirm that is the problem by changing the dpdk_lcore values on the adapters to 4 and 5 to match the second and third entries in the dpdk_corelist.

The freezing behavior does seem bad and it would be ideal to check for the problem and properly notify the user.