OpenVisualCloud / Media-Transport-Library

A real-time media transport(DPDK, AF_XDP, RDMA) stack for both raw and compressed video based on COTS hardware.
BSD 3-Clause "New" or "Revised" License
168 stars 54 forks source link

Cannot run sample RxSt20PipelineSample, mlx5_common: No Verbs device matches PCI device #402

Closed prankurgit closed 1 year ago

prankurgit commented 1 year ago

Dear Media-library team,

I have configured my debian system to install the dpdk and media transport library. I followed the build and run guide and can successfully create VF and bind vfio-pci to the vfs. NIC - Mellanox Connect X6 - Dx (MLNX_OFED 23.04-1.1.3.0) OS - Debian 11, (kernel 5.15.55) CPU - Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz, 16 cores, no numa, hyperthreading disabled in bios Except core 0 all other cores are isolated from linux kernel scheduler. NOTE: no vmx flag seen in lscpu command

  1. I installed the MLNX_OFED-23.04-1.1.3.0 with the following command. $ ./mlnxofedinstall --dpdk --upstream-libs --skip-distro-check
  2. activated the intel vt-D setting in the bios and added iommu to the kernel commandline , hugepages activated etc. NOTE: SR-IOV support in BIOS is disabled, Do I need this ?
  3. Then followed the steps to clone and install both media library and dpdk
  4. with the nicctl.sh script I could create VFs and bind vfio-pci to them.

root@KC200-24-FLEX-AIC:~/Media-Transport-Library# ./script/nicctl.sh create_vf 0000:c3:00.0 0000:c3:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=mlnx1 drv=mlx5_core unused= Active Bind 0000:c3:00.2(eth0) to vfio-pci success Bind 0000:c3:00.3(eth1) to vfio-pci success Bind 0000:c3:00.4(eth2) to vfio-pci success Bind 0000:c3:00.5(eth3) to vfio-pci success Bind 0000:c3:00.6(eth4) to vfio-pci success Bind 0000:c3:00.7(eth5) to vfio-pci success Create VFs on PF bdf: 0000:c3:00.0 mlnx1 succ

root@KC200-24-FLEX-AIC:~/Media-Transport-Library# ./script/nicctl.sh status 0000:c3:00.0 0000:c3:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=mlnx1 drv=mlx5_core unused=vfio-pci Active

root@KC200-24-FLEX-AIC:~/Media-Transport-Library# ./script/nicctl.sh status 0000:c3:00.2 0000:c3:00.2 'ConnectX Family mlx5Gen Virtual Function 101e' drv=vfio-pci unused=mlx5_core Bind bdf: 0000:c3:00.2 to kernel eth0 succ

root@KC200-24-FLEX-AIC:~/Media-Transport-Library# lspci | grep Mel c3:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] c3:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] c3:00.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function c3:00.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function c3:00.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function c3:00.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function c3:00.6 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function c3:00.7 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

root@KC200-24-FLEX-AIC:~/Media-Transport-Library# lsmod | grep -e ib -e mlx ib_ipoib 151552 0 ib_cm 139264 2 rdma_cm,ib_ipoib ib_umad 40960 0 mlx5_ib 454656 0 ib_uverbs 151552 2 rdma_ucm,mlx5_ib ib_core 458752 8 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm mlx5_core 2019328 1 mlx5_ib mlxfw 36864 1 mlx5_core mlxdevm 176128 1 mlx5_core mlx_compat 24576 11 rdma_cm,ib_ipoib,mlxdevm,iw_cm,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core ptp 32768 4 igb,mlx5_core pci_hyperv_intf 16384 1 mlx5_core

$ dmesg ... [ 369.223349] VFIO - User Level meta-driver version: 0.3 [ 369.293820] mlx5_core 0000:c3:00.0: E-Switch: Enable: mode(LEGACY), nvfs(6), active vports(7) [ 369.408209] pci 0000:c3:00.2: [15b3:101e] type 00 class 0x020000 [ 369.414514] pci 0000:c3:00.2: enabling Extended Tags [ 369.420836] pci 0000:c3:00.2: Adding to iommu group 144 [ 369.427207] mlx5_core 0000:c3:00.2: enabling device (0000 -> 0002) [ 369.434092] mlx5_core 0000:c3:00.2: firmware version: 22.37.1014 [ 369.619426] mlx5_core 0000:c3:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps [ 369.644794] mlx5_core 0000:c3:00.2: Assigned random MAC address d6:52:af:af:f9:4d [ 369.652371] mlx5_core 0000:c3:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 369.778229] mlx5_core 0000:c3:00.2: Supported tc offload range - chains: 1, prios: 1 ....

  1. But when running the RxTxApp I get the following error. NOTE : I have ST2110 source generating signal at mcast address 239.0.90.1 and src ip is 192.168.1.90 root@KC200-24-FLEX-AIC:~/Media-Transport-Library# ./build/app/RxSt20PipelineSample --p_port 0000:c3:00.2 --p_sip 192.168.1.90 --p_rx_ip 239.0.90.1 MT: dev_eal_init(0), port_param: 0000:c3:00.2 MT: dev_eal_init, wait eal_init_thread done EAL: Detected CPU lcores: 16 EAL: Detected NUMA nodes: 1 EAL: Detected shared linkage of DPDK EAL: Selected IOVA mode 'VA' EAL: No free 1048576 kB hugepages reported on node 0 EAL: VFIO support initialized EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.2 (socket -1) mlx5_common: No Verbs device matches PCI device 0000:c3:00.2, are kernel drivers loaded? mlx5_common: Verbs device not found: 0000:c3:00.2 mlx5_common: Failed to initialize device context. EAL: Requested device 0000:c3:00.2 cannot be used EAL: Bus (pci) probe failed. TELEMETRY: No legacy callbacks, legacy socket not created MT: st version: 23.12.0 Fri Aug 4 09:02:04 2023 a578ad68 gcc-10.2.1, dpdk version: DPDK 23.03.0 MT: Error: mt_dev_get_socket, failed to locate 0000:c3:00.2. Please run nicctl.sh MT: Error: mtl_init, get socket fail -19 main: mtl_init fail

Do you know or have seen this issue before ?

Cheers Prankur

frankdjx commented 1 year ago

Seems it's failed on dpdk startup. Can you run dpdk sample application(testpmd) to confirm the MLX pmd is working well?BTW, the NIC we verified is intel E810/E710 serie, the status on other NIC is not know.

prankurgit commented 1 year ago

Dear Mr. Du, Thanks for your reply. Please see the following output from the dpdk-testpmd command.

./build/app/dpdk-testpmd -l 1-4 -n 4 -- -i EAL: Detected CPU lcores: 16 EAL: Detected NUMA nodes: 1 EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: VFIO support initialized EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:c3:00.0 (socket -1) EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:c3:00.1 (socket -1) EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.2 (socket -1) mlx5_common: No Verbs device matches PCI device 0000:c3:00.2, are kernel drivers loaded? mlx5_common: Verbs device not found: 0000:c3:00.2 mlx5_common: Failed to initialize device context. EAL: Requested device 0000:c3:00.2 cannot be used EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.3 (socket -1) mlx5_common: No Verbs device matches PCI device 0000:c3:00.3, are kernel drivers loaded? mlx5_common: Verbs device not found: 0000:c3:00.3 mlx5_common: Failed to initialize device context. EAL: Requested device 0000:c3:00.3 cannot be used EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.4 (socket -1) mlx5_common: No Verbs device matches PCI device 0000:c3:00.4, are kernel drivers loaded? mlx5_common: Verbs device not found: 0000:c3:00.4 mlx5_common: Failed to initialize device context. EAL: Requested device 0000:c3:00.4 cannot be used EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.5 (socket -1) mlx5_common: No Verbs device matches PCI device 0000:c3:00.5, are kernel drivers loaded? mlx5_common: Verbs device not found: 0000:c3:00.5 mlx5_common: Failed to initialize device context. EAL: Requested device 0000:c3:00.5 cannot be used EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.6 (socket -1) mlx5_common: No Verbs device matches PCI device 0000:c3:00.6, are kernel drivers loaded? mlx5_common: Verbs device not found: 0000:c3:00.6 mlx5_common: Failed to initialize device context. EAL: Requested device 0000:c3:00.6 cannot be used EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.7 (socket -1) mlx5_common: No Verbs device matches PCI device 0000:c3:00.7, are kernel drivers loaded? mlx5_common: Verbs device not found: 0000:c3:00.7 mlx5_common: Failed to initialize device context. EAL: Requested device 0000:c3:00.7 cannot be used TELEMETRY: No legacy callbacks, legacy socket not created Interactive-mode selected Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa. testpmd: create a new mbuf pool : n=171456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc Configuring Port 0 (socket 0) Port 0: E8:EB:D3:6C:60:FA Configuring Port 1 (socket 0) Port 1: E8:EB:D3:6C:60:FB Checking link statuses... Done testpmd>

For the physical interfaces 0000:c3:00.0 and .1 there was no such weird message, but the virtual functions from 0000:c3::00.2 - .7 have same error message.

Do you believe I should enable PCIe Extended Tags in the bios ?

I also looked at your youtube video regarding the "Real Time low latency media transport stack based on dpdk" and decided to try our Mellanox cards CX6-Dx to check if I can find an alternative to the rivermax sdk and kernel drivers.

Cheers Prankur

frankdjx commented 1 year ago

Not sure, the most possible cause is no VF PMD support in DPDK for this NIC. The PF pmd can not be used for VF.

frankdjx commented 1 year ago

Also, check http://doc.dpdk.org/guides/nics/mlx5.html for more information.

prankurgit commented 1 year ago

Dear Mr. Du,

Thanks for your prompt reply. I checked their documentation and it looks the dpdk pmd is supported for the ConnectX6-Dx. I will try with their tested platform hardware / Operating system combination.

I want to ask some questions regarding the dpdk user summit 2022 video, If you can please share your email to prankur.chauhan89@gmail.com then I will take it up there.

By the way a stupid question: I am not running any VM and directly running Debian on the hardware. So there are no hypervisor child partition / parent partition redirect of calls to read/write data from PCIe NIC card.

I am sure the media transport library also works on native operating system without any virtual machine OR ?

Cheers Prankur

frankdjx commented 1 year ago

Certainly, the media transport library operates efficiently on Virtual Functions (VFs) with the assistance of VFIO for bare metal setup. These VFs are created by Single Root I/O Virtualization (SR-IOV). From a user space perspective, there's no discernible difference between these VFs and the Physical Functions (PFs).

prankurgit commented 1 year ago

Dear Mr. Du, I have made some progress in terms of testing dpdk-testpmd software. The issue was that NVIDIA PMD uses the mlx5_core driver and NOT the vfio-pci driver unlike other PMDs.

“ PMDs which use the bifurcated driver co-exists with the device kernel driver. On such model the NIC is controlled by the kernel, while the data path is performed by the PMD directly on top of the device. “

Unfortunately the media transport library test application has now some other issue ( this issue is also seen on the PFs //physical function or interface)

MT: Error: parse_driver_info, unknown nic driver mlx5_pci

$ ./build/app/RxSt20PipelineSample --p_port 0000:c3:00.3 --p_sip 192.168.1.90 --p_rx_ip 239.0.90.1 MT: dev_eal_init(0), port_param: 0000:c3:00.3 MT: dev_eal_init, wait eal_init_thread done EAL: Detected CPU lcores: 16 EAL: Detected NUMA nodes: 1 EAL: Detected shared linkage of DPDK EAL: Selected IOVA mode 'VA' EAL: No free 1048576 kB hugepages reported on node 0 EAL: VFIO support initialized EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.3 (socket -1) TELEMETRY: No legacy callbacks, legacy socket not created MT: st version: 23.12.0 Wed Aug 9 12:31:50 2023 5ddeda73 gcc-10.2.1, dpdk version: DPDK 23.03.0 MT: mt_dev_get_socket, direct soc_id from SOCKET_ID_ANY to 0 for 0000:c3:00.3 MT: mtl_init(0), socket_id 0 MT: Error: parse_driver_info, unknown nic driver mlx5_pci MT: Error: mt_dev_if_init, parse_driver_info fail(-5) for 0000:c3:00.3 MT: dev_close_port(0), port not started MT: Error: mtl_init, st dev if init fail -5 MT: Warn: mt_stat_unregister, cb 0x7f2faaf2c530 priv 0x118082abc0 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee1360 priv 0x1180832378 not found MT: mt_cni_uinit, succ MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180832728 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180838b10 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x118083eef8 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x11808452e0 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x118084b6c8 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180851ab0 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180857e98 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x118085e280 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180864668 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x118086aa50 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180870e38 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180877220 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x118087d608 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x11808839f0 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x1180889dd8 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x11808901c0 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x11808965a8 not found MT: Warn: mt_stat_unregister, cb 0x7f2faaee0560 priv 0x118089c990 not found MT: Error: dev_uinit_lcores, no lcore shm attached MT: Warn: mt_stat_unregister, cb 0x7f2faaedd7b0 priv 0x118082abc0 not found MT: dev_stop_port(0), port not started MT: mt_dev_free, succ MT: mt_main_free, succ MT: dev_close_port(0), port not started MT: mt_dev_uinit, succ MT: mtl_uninit, succ main: mtl_init fail

=================================================== from the dpdk-testpmd the interface seems to work

~/dpdk# ./build/app/dpdk-testpmd -l 1-5 -n 4 -a 0000:c3:00.3 -- -i EAL: Detected CPU lcores: 16 EAL: Detected NUMA nodes: 1 EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: VFIO support initialized EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.3 (socket -1) TELEMETRY: No legacy callbacks, legacy socket not created Interactive-mode selected Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa. testpmd: create a new mbuf pool : n=179456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc

Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.

Configuring Port 0 (socket 0) Port 0: F2:E2:55:9E:30:D8 Checking link statuses... Done testpmd> start io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP allocation mode: native Logical Core 2 (socket 0) forwards packets on 1 streams: RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

io packet forwarding packets/burst=32 nb forwarding cores=1 - nb forwarding ports=1 port 0: RX queue number: 1 Tx queue number: 1 Rx offloads=0x0 Tx offloads=0x10000 RX queue: 0 RX desc=256 - RX free threshold=64 RX threshold registers: pthresh=0 hthresh=0 wthresh=0 RX Offloads=0x0 TX queue: 0 TX desc=256 - TX free threshold=0 TX threshold registers: pthresh=0 hthresh=0 wthresh=0 TX offloads=0x10000 - TX RS bit threshold=0 testpmd> stop Telling cores to stop... Waiting for lcores to finish...

---------------------- Forward statistics for port 0 ---------------------- RX-packets: 4 RX-dropped: 0 RX-total: 4 TX-packets: 4 TX-dropped: 0 TX-total: 4

+++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++ RX-packets: 4 RX-dropped: 0 RX-total: 4 TX-packets: 4 TX-dropped: 0 TX-total: 4 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Done. testpmd> quit

Stopping port 0... Stopping ports... Done

Shutting down port 0... Closing ports... Port 0 is closed Done

Bye...

Question: have you encountered this issue before ?

Cheers Prankur

frankdjx commented 1 year ago

Hi Prankur,

You have to add this new dev into the supported list, https://github.com/OpenVisualCloud/Media-Transport-Library/blob/main/lib/src/mt_dev.c#L29

Something like below

    {
        .name = "mlx5_pci",
        .port_type = MT_PORT_PF,
        .drv_type = MT_DRV_MLX5, /* add a new enum in the header file */
        .flow_type = MT_FLOW_ALL,
    },

Please note rate limit feature is only available on Intel E810, for other device, the TSC pacing is used.

prankurgit commented 1 year ago

Dear Mr. Du,

Thanks for the patch, I added the support for mlx5_pci driver as you suggested. The RxSt20PipelineSample works and I can receive 1 x ST2110-20 stream from my source. I also verified the pcie bandwidth with intel pcm tool (pcm-iio) I did not test extensively for the performance but atleast it is a starting point.

By the way I saw the pipeline sample did not use the isolated cores , is there some commandline option like in dpdk (-l x-y) where I can specify which cores to use to run the rte-workers ?

I do not quite understand what you mean by rate limit ? Also by TSC pacing, you mean the TSC timer is used for tx traffic shaping (ST2110-21) and not the HPET timer ?

Cheers Prankur

frankdjx commented 1 year ago

Hi Prankur,

Great to here it can work, can you help to create a PR to upstream the patch?

Yes, TSC time source is used for pacing shaping if no rate limit available, in this case, we use TSC time to decide when put the packet into the NIC queue, but please note TSC pacing can't fully complaint to narrow gapping since the actual time transmitted is not controllable.

Ratelimit is a hardware feature for E810, we use this function with some software creative to achieve the strict narrow gapping mode.

For isolated cores, the lib support this already by lcores in struct mtl_init_params, the pipeline sample not add this customization argument, but it can be easily added. The RxTxApp has --locres support already, see https://github.com/OpenVisualCloud/Media-Transport-Library/blob/main/app/src/args.c#L501.

prankurgit commented 1 year ago

Dear Mr. Du,

The patch is mlx5.txt

Please check the test results.

$ ~/Media-Transport-Library# ./script/nicctl.sh create_vf 0000:c3:00.0 2 0000:c3:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=mlnx1 drv=mlx5_core unused=vfio-pci Active PMD uses bifurcated driver, No need to bind the 0000:c3:00.2(eth0) to vfio-pci PMD uses bifurcated driver, No need to bind the 0000:c3:00.3(eth1) to vfio-pci Create VFs on PF bdf: 0000:c3:00.0 mlnx1 succ

$ ~/Media-Transport-Library# ./build/app/RxSt20PipelineSample --p_port 0000:c3:00.3 --p_sip 192.168.1.90 --p_rx_ip 239.0.90.1 MT: dev_eal_init(0), port_param: 0000:c3:00.3 MT: dev_eal_init, wait eal_init_thread done EAL: Detected CPU lcores: 16 EAL: Detected NUMA nodes: 1 EAL: Detected shared linkage of DPDK EAL: Selected IOVA mode 'VA' EAL: No free 1048576 kB hugepages reported on node 0 EAL: VFIO support initialized EAL: Probe PCI driver: mlx5_pci (15b3:101e) device: 0000:c3:00.3 (socket -1) TELEMETRY: No legacy callbacks, legacy socket not created MT: st version: 23.12.0 Thu Aug 10 07:47:10 2023 b9262130-dirty gcc-10.2.1, dpdk version: DPDK 23.03.0 MT: mt_dev_get_socket, direct soc_id from SOCKET_ID_ANY to 0 for 0000:c3:00.3 MT: mtl_init(0), socket_id 0 MT: mt_dev_if_init(0), use mt ptp source MT: mt_dev_if_init(0), user request queues tx 0 rx 1, deprecated sessions tx 0 rx 0 MT: Warn: dev_config_port(0), failed to setup all ptype, only 0 supported MT: dev_config_port(0), tx_q(1 with 512 desc) rx_q (2 with 2048 desc) MT: mt_mempool_create_by_ops(0), succ at 0x1180afde40 size 2.156250m n 1024 d 2048 for T_P0_SYS MT: mt_mempool_create_by_ops(0), succ at 0x1180e8d2c0 size 6.468750m n 3072 d 2048 for R_P0Q0_MBUF MT: mt_mempool_create_by_ops(0), succ at 0x1181afdf00 size 4.968750m n 3072 d 1536 for R_P0Q1_MBUF MT: mt_dev_if_init(0), port_id 0 port_type 2 drv_type 8 MT: mt_dev_if_init(0), dev_capa 0x14, offload 0x196af:0x18621f queue offload 0x0:0x18601f, rss : 0xf00000000803afbc MT: mt_dev_if_init(0), system_rx_queues_end 1 hdr_split_rx_queues_end 1 MT: mt_dev_if_init(0), sip: 192.168.1.90 MT: mt_dev_if_init(0), netmask: 255.255.255.0 MT: mt_dev_if_init(0), gateway: 0.0.0.0 MT: mt_dev_if_init(0), mac: ce:31:82:3d:90:d0 MT: dev_init_lcores, shared memory attached at 0x7f6c52fe0000 nattch 1 MT: dev_start_port(0), rx_defer 0 MT: mt_eth_link_dump(0), link_speed 100g link_status 1 link_duplex 1 link_autoneg 1 MT: Error: dev_rl_shaper_add(0), shaper add error: (-38)Function not implemented MT: Error: dev_tx_queue_set_rl_rate(0), rl shaper get fail for q 0 MT: Warn: dev_if_init_pacing(0), fallback to tsc as rl init fail MT: mt_dev_create(0), feature 0x70, tx pacing tsc MT: mt_sch_mrg_init, succ with data quota 31068 M, nb_tasklets 16 MT: mt_sch_add_quota(0:0), quota 0 total now 0 MT: dev_stat_thread, start MT: mt_dev_create, succ, stat period 10s MT: mt_dev_get_tx_queue(0), q 0 without rl MT: mt_mcast_init, report every 10 seconds MT: mt_dev_get_rx_queue(0), q 0 ip 0.0.0.0 port 0 MT: cni_queues_init(0), rxq 0 MT: mt_sch_register_tasklet(0), tasklet cni registered into slot 0 MT: cni_traffic_thread, start MT: st_plugins_init, succ MT: admin_thread, start MT: config_parse_json, parse kahawai.json with json-c version: 0.15 MT: st22_decoder_register(0), st22_decoder_sample registered, device 1 cap(0x300000000000000:0x70000002b) MT: st22_encoder_register(0), st22_encoder_sample registered, device 1 cap(0x70000002b:0x300000000000000) st_plugin_create, succ with st22 sample plugin MT: st_plugin_register(0), /usr/local/lib/x86_64-linux-gnu/libst_plugin_st22_sample.so registered, version 1 MT: Warn: st_plugin_register, dlopen /usr/local/lib64/libst_plugin_st22_sample.so fail MT: mt_main_create, succ MT: mtl_init, succ, tsc_hz 2400000000 MT: mtl_init, simd level avx512_vbmi, flags 0x1 MT: rx_st20p_init_dst_fbs(0), size 5184000 fmt 5 with 3 frames MT: mt_sch_add_quota(0:0), quota 2589 total now 2589 MT: mt_sch_get(0), succ with quota_mbs 2589 MT: mt_sch_register_tasklet(0), tasklet rvs_pkt_rx registered into slot 1 MT: mt_sch_register_tasklet(0), tasklet rvs_ctl registered into slot 2 MT: rvs_mgr_init(0), succ MT: dev_rx_queue_create_flow(0), queue 1 succ, ip 239.0.90.1 port 20000 MT: mt_dev_get_rx_queue(0), q 1 ip 239.0.90.1 port 20000 MT: rv_init_hw(0), port(l:0,p:0), queue 1 udp 20000 MT: mt_mcast_join(0), new group 239.0.90.1 MT: rv_attach(0), 3 frames with size 5184000(810,0), type 0, progressive MT: rv_attach(0), w 1920 h 1080 fmt ST20_FMT_YUV_422_10BIT packing 0 pt 112 flags 0x0 frame time 16.683333ms MT: mt_sch_add_quota(0:0), quota 1294 total now 3883 MT: st20_rx_create_with_mask, succ on sch 0 session 0 MT: st20p_rx_create(0), transport fmt ST20_FMT_YUV_422_10BIT, output fmt YUV422RFC4175PG2BE10 rx_st20p_frame_thread(0), start MT: mt_calibrate_tsc, tscHz 2400009156 MT: mt_dev_get_lcore, available lcore 7 MT: sch_tasklet_func(0), start with 3 tasklets MT: sch_start(0), succ on lcore 7 MT: mt_dev_start, succ MT: _mt_start, succ, avail ports 1 MT: cni_traffic_thread, stop MT: rvs_ctl_tasklet_start(0), succ MT: M T D E V S T A T E MT: DEV(0): Avr rate, tx: 0.000040 Mb/s, rx: 2350.522345 Mb/s, pkts, tx: 1, rx: 2220362 MT: Error: DEV(0): Status: imissed 239762 ierrors 0 oerrors 0 rx_nombuf 0 MT: Error: rx_good_packets: 559 MT: Error: rx_good_bytes: 739838 MT: Error: rx_q1_packets: 559 MT: Error: rx_q1_bytes: 739838 MT: Error: rx_multicast_packets: 2460541 MT: Error: rx_multicast_bytes: 3265818645 MT: Error: tx_multicast_packets: 8 MT: Error: tx_multicast_bytes: 763 MT: Error: rx_out_of_buffer: 239762 MT: CNI(0): eth_rx_rate 0 Mb/s, eth_rx_cnt 7 MT: PTP(0): time 1691656569194256528, 2023-08-10 08:36:09 MT: RX_VIDEO_SESSION(0,0:st20p_test): fps 0.000000 frames 0 pkts 0 MT: RX_VIDEO_SESSION(0,0:st20p_test): throughput 0 Mb/s, cpu busy 4.796298 MT: RX_VIDEO_SESSION(0,0): wrong hdr dropped pkts 2221485 MT: E N D S T A T E

^Csample_sig_handler, signal 2 rx_st20p_frame_thread(0), stop main(0), received frames 0 MT: sch_tasklet_func(0), end with 3 tasklets MT: cni_traffic_thread, start MT: mt_dev_put_lcore, lcore 7 MT: sch_stop(0), succ MT: mt_sch_stop_all, succ MT: _mt_stop, succ main(0), error, no received frames 0 MT: RX_VIDEO_SESSION(0,0:st20p_test): fps 0.000000 frames 0 pkts 0 MT: RX_VIDEO_SESSION(0,0:st20p_test): throughput 0 Mb/s, cpu busy 4.796298 MT: RX_VIDEO_SESSION(0,0): wrong hdr dropped pkts 327595 MT: mt_mcast_leave(0), group 239.0.90.1 ref cnt 0 MT: mt_dev_put_rx_queue(0), q 1 MT: sch_free_quota(0), quota 3883 total now 0 MT: st20_rx_free, succ on sch 0 session 0 MT: st22_decoder_unregister(0), unregister st22_decoder_sample MT: st22_encoder_unregister(0), unregister st22_encoder_sample st_plugin_free, succ with st22 sample plugin MT: admin_thread, stop MT: mt_sch_unregister_tasklet(0), tasklet cni(0) unregistered MT: cni_traffic_thread, stop MT: mt_dev_put_rx_queue(0), q 0 MT: mt_cni_uinit, succ MT: sch_free_quota(0), quota 0 total now 0 MT: mt_sch_put(0), ref_cnt now zero MT: Warn: sch_stop(0), not started MT: mt_sch_unregister_tasklet(0), tasklet rvs_ctl(2) unregistered MT: mt_sch_unregister_tasklet(0), tasklet rvs_pkt_rx(1) unregistered MT: rvs_mgr_uinit(0), succ MT: mt_dev_put_tx_queue(0), q 0 MT: dev_stat_thread, stop MT: dev_stop_port(0), succ MT: mt_dev_free, succ MT: mt_main_free, succ MT: mt_mempool_free, free mempool R_P0Q0_MBUF MT: mt_mempool_free, free mempool R_P0Q1_MBUF MT: mt_mempool_free, free mempool T_P0_SYS MT: dev_close_port(0), succ MT: mt_dev_uinit, succ MT: mtl_uninit, succ

============ IP info ===============

$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether a8:a1:59:c2:80:ba brd ff:ff:ff:ff:ff:ff 3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether a8:a1:59:c2:80:bb brd ff:ff:ff:ff:ff:ff inet 192.168.30.24/24 brd 192.168.30.255 scope global eno1 valid_lft forever preferred_lft forever inet6 fe80::aaa1:59ff:fec2:80bb/64 scope link valid_lft forever preferred_lft forever 4: mlnx1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e8:eb:d3:6c:60:fa brd ff:ff:ff:ff:ff:ff inet 192.168.1.24/24 brd 192.168.1.255 scope global mlnx1 valid_lft forever preferred_lft forever inet6 fe80::eaeb:d3ff:fe6c:60fa/64 scope link valid_lft forever preferred_lft forever 5: mlnx2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e8:eb:d3:6c:60:fb brd ff:ff:ff:ff:ff:ff inet 192.168.2.24/24 brd 192.168.2.255 scope global mlnx2 valid_lft forever preferred_lft forever inet6 fe80::eaeb:d3ff:fe6c:60fb/64 scope link valid_lft forever preferred_lft forever 6: usb0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 9e:b7:9f:38:54:e8 brd ff:ff:ff:ff:ff:ff 14: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 6e:cf:9f:bf:c8:49 brd ff:ff:ff:ff:ff:ff 15: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ce:31:82:3d:90:d0 brd ff:ff:ff:ff:ff:ff inet6 fe80::cc31:82ff:fe3d:90d0/64 scope link valid_lft forever preferred_lft forever

Cheers

frankdjx commented 1 year ago

I see many error in the log. MT: RX_VIDEO_SESSION(0,0): wrong hdr dropped pkts 2221485 It's caused the payload type(default) not set correctly, you can customize it by --payload_type xxx.

And the patch looks good to me, can you create a PR that we can processed the upstream process?

prankurgit commented 1 year ago

Dear Mr. Du,

Thanks for approving the pull-request.

I chose the payload type to be 112 (video) but I still see errors For information I tried to configure the source to BPM and GPM both does show wrong header format. even though data is transferred over PCIe bus as seen from the pcm-iio tool

Please see the logs media transport media_log.txt

pcm -iio media_pcm_iio.txt

frankdjx commented 1 year ago

Can you add a log on below line to print the payload type it expected and the TX is actually sending. https://github.com/OpenVisualCloud/Media-Transport-Library/blob/11661b46fcfe581d2e65db4d145a56aa4c060288/lib/src/st2110/st_rx_video_session.c#L1729

This is the only place which cause this error for frame mode.

And you can use this API mt_mbuf_dump_hdr to dump the received mbuf and check if there's any mismatch for the RTP header.

prankurgit commented 1 year ago

Dear Mr. Du, Please check my comments

Can you add a log on below line to print the payload type it expected and the TX is actually sending. [Prankur] You mean the RX is receiving ?

Indeed when I set the --payload_type to 96 then I can receive the video without any wrong header format errors. There are still some error messages like ... MT: Error: dev_rl_shaper_add(0), shaper add error: (-38)Function not implemented ... MT: Error: DEV(0): Status: imissed 239964 ierrors 0 oerrors 0 rx_nombuf 0 MT: Error: rx_good_packets: 593 MT: Error: rx_good_bytes: 784834 MT: Error: rx_q1_packets: 593 MT: Error: rx_q1_bytes: 784834 MT: Error: rx_multicast_packets: 2460738 MT: Error: rx_multicast_bytes: 3266080246 MT: Error: tx_multicast_packets: 2 MT: Error: tx_multicast_bytes: 120 MT: Error: rx_out_of_buffer: 239964 ...

which don't make much sense to me. Should I just ignore them ?

Please check the complete logs here : media_log.txt

Cheers Prankur

frankdjx commented 1 year ago

Yes, ignore them. dev_rl_shaper_add is for rate limit function detect, the NIC without this feature will get this error. imissed print is only happen on the start, system is busy on the initial routine and no time to retrieve the packet from NIC.

prankurgit commented 1 year ago

Dear Mr. Du,

Thanks for your comments. I will be closing this issue. Thanks for supporting us for the ConnectX6-Dx. Your help is greatly appreciated.

Cheers Prankur