Open Aang23 opened 8 months ago
Did the same tests with the UC_200 image attempting to stream at 245.76MSPS on the exact same system. Unfortunately that does not work at all either with the same symptoms as using the CG_400 image.
Benchmark rate summary:
Num received samples: 0
Num dropped samples: 0
Num overruns detected: 0
Num transmitted samples: 2455190080
Num sequence errors (Tx): 0
Num sequence errors (Rx): 0
Num underruns detected: 1734
Num late commands: 0
Num timeouts (Tx): 0
Num timeouts (Rx): 0
And while it does manage to sustain 122.88MSPS a bit better, underruns are also observed along the way. There are very few in the benchmark at random points during the test, but they happen more often in any other UHD transmit example.
Benchmark rate summary:
Num received samples: 0
Num dropped samples: 0
Num overruns detected: 0
Num transmitted samples: 1228853888
Num sequence errors (Tx): 0
Num sequence errors (Rx): 0
Num underruns detected: 5
Num late commands: 0
Num timeouts (Tx): 0
Num timeouts (Rx): 0
The same system does manage to keep a x310 at 200MSPS happy however.
Hey, i tried this and got it to work with all tricks from the KB Guide
I also have a small script to do this after Startup:
#!/bin/bash
size=250000000
sudo sysctl -w net.core.wmem_max=$size
sudo sysctl -w net.core.rmem_max=$size
sudo sysctl -w net.core.wmem_default=$size
sudo sysctl -w net.core.rmem_default=$size
sudo ip link set dev <dev1> mtu 9000
sudo ip link set dev <dev2> mtu 9000
sudo ethtool -G <dev1> tx 4096 rx 4096
sudo ethtool -G <dev2> tx 4096 rx 4096
for ((i=0;i<$(nproc --all);i++)); do sudo cpufreq-set -c $i -r -g performance; done
What arguments do you use for the Benchmark?
You should use the priority
flag. For me the full command looks like this:
$sudo ./benchmark_rate \
--args "type=x4xx,addr=192.168.10.2,second_addr=192.168.20.2,mgmt_addr=<IPaddr>,master_clock_rate=500e6" \
--priority "high" \
--multi_streamer \
--duration 60 \
--channels "0" \
--rx_rate 500e6 \
--rx_subdev "B:1" \
--tx_rate 500e6 \
--tx_subdev "A:0"
[...]
[00:00:17.641169153] Testing transmit rate 500.000000 Msps on 1 channels
[00:01:17.641902988] Benchmark complete.
Benchmark rate summary:
Num received samples: 29999545568
Num dropped samples: 0
Num overruns detected: 0
Num transmitted samples: 29999401344
Num sequence errors (Tx): 0
Num sequence errors (Rx): 0
Num underruns detected: 0
Num late commands: 0
Num timeouts (Tx): 0
Num timeouts (Rx): 0
Done!
So at least the Benchmark works fine. However i have not yet achieved that high rates with other Applications like GNU Radio.
Cheers, Sebastian!
Thanks @basti-schr, unfortunately this is all things I have already done (in order to get RX working at these rates), but that does not help on my end.
I don't expect GNU Radio to handle it at all, but the benchmark not passing and UHD saturating a single CPU core is a bit suspicious to me in comparison to the behavior seen with other USRPs.
Okay, another thing I forgot to mention is to install DPDK. Especially the last point from this article helped a lot by setting the RT_RUNTIME_SHARE
feature flag. This made a big difference for me and brought the underruns without active DPDK to <5 and with DPDK to 0.
Thanks for the information again @basti-schr. Clearly this should be a bit more obvious than it is now in the documentation! :-)
I have set DPDK up as instructed (v21.11.4). Testing loopback via DPDK examples works as expected, but UHD still does not work at all.
/etc/uhd/uhd.conf
;When present in device args, use_dpdk indicates you want DPDK to take over the UDP transports
;The value here represents a config, so you could have another section labeled use_dpdk=myconf
;instead and swap between them
[use_dpdk=1]
;dpdk_mtu is the NIC's MTU setting
;This is separate from MPM's maximum packet size
dpdk_mtu=9000
;dpdk_driver is the -d flag for the DPDK EAL. If DPDK doesn't pick up the driver for your NIC
;automatically, you may need this argument to point it to the folder where it can find the drivers
;Note that DPDK will attempt to load _everything_ in that folder as a driver, so you may want to
;create a separate folder with symlinks to the librte_pmd_* and librte_mempool_* libraries.
;dpdk_driver=/usr/local/lib/x86_64-linux-gnu/dpdk/pmds-21.0/
;dpdk_corelist is the -l flag for the DPDK EAL. See more at the link
; https://doc.dpdk.org/guides-21.11/linux_gsg/build_sample_apps.html#running-a-sample-application
;Note if you use multiple SFP ports in a streaming application simultaneously,
;you can specify multiple cores in the core list (e.g. 0, 1, 2) and then assign
;them each to the separate SFP port/NIC.
dpdk_corelist=0,1
;dpdk_num_mbufs is the total number of packet buffers allocated
;to each direction's packet buffer pool
;This will be multiplied by the number of NICs, but NICs on the same
;CPU socket share a pool. When using Mellanox NICs, this value must be greater
;than the dpdk_num_desc value in the next section.
dpdk_num_mbufs=4096
;dpdk_mbuf_cache_size is the number of buffers to cache for a CPU
;The cache reduces the interaction with the global pool
dpdk_mbuf_cache_size=64
[dpdk_mac=b8:3f:d2:b6:ff:52]
;Using a separate dpdk_lcore value for each SFP connection/MAC entry
;can possibly result in improved streaming performance. E.g. dpdk_lcore = 2.
dpdk_lcore = 1
dpdk_ipv4 = 192.168.20.1/24
dpdk_num_desc=4096
/etc/default/grub
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
# info -f grub -n 'Simple configuration'
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX="iommu=pt intel_iommu=on hugepages=2048"
# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"
# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console
# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480
# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true
# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"
# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"
Benchmark command :
sudo ./examples/benchmark_rate --tx_rate 491.52e6 --args use_dpdk=1,mgmt_addr0=10.10.10.130,addr0=192.168.20.2
This results in a RFNoc error, which I am unable to find much information about in this context.
[INFO] [UHD] linux; GNU C++ version 11.4.0; Boost_107400; DPDK_21.11; UHD_4.6.0.HEAD-0-g50fa3baa
EAL: Detected CPU lcores: 36
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 1048576 kB hugepages reported
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:04:00.0 (socket 0)
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:04:00.1 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
[00:00:00.000174] Creating the usrp device with: use_dpdk=1,mgmt_addr0=10.10.10.130,addr0=192.168.20.2...
[INFO] [MPMD] Initializing 1 device(s) in parallel with args: mgmt_addr=10.10.10.130,type=x4xx,product=x410,serial=32C3D84,name=ni-x4xx-32C3D84,fpga=CG_400,claimed=False,use_dpdk=1,mgmt_addr0=10.10.10.130,addr0=192.168.20.2
[INFO] [MPM.PeriphManager] init() called with device args `fpga=CG_400,mgmt_addr=10.10.10.130,name=ni-x4xx-32C3D84,product=x410,use_dpdk=1,clock_source=internal,time_source=internal,initializing=True'.
[ERROR] [RFNOC::GRAPH] Error during initialization of block 0/Radio#0!
[ERROR] [RFNOC::GRAPH] Caught exception while initializing graph: RfnocError: OpTimeout: Control operation timed out waiting for space in command buffer
Error: RuntimeError: Failure to create rfnoc_graph.
Unforunately though here RT_RUNTIME_SHARE
does not seem to have any effect.
Okay, from this point i can only guess:
(./mlnxofedinstall --dpdk)
Maybe that`s why for you the EAL reports IOVA mode as 'PA' but for me it`s 'VA'
$ grep Huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 2048
HugePages_Free: 2000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 4194304 kB
But this message from EAL is normal:
EAL: No free 1048576 kB hugepages reported on node 0
EAL: No available 1048576 kB hugepages reported
dpdk_num_desc
because i got other problems and just let them at the default setting./etc/uhd/uhd.conf
[use_dpdk=1]
dpdk_mtu=9000
dpdk_driver=/usr/lib/x86_64-linux-gnu/dpdk/pmds-22.0
dpdk_corelist=2,3,4
dpdk_num_mbufs=4095
dpdk_mbuf_cache_size=64
# dpdk_link_timeout=5000
[dpdk_mac=08:c0:eb:97:8c:ee]
dpdk_lcore = 3
dpdk_ipv4 = 192.168.10.1/24
# dpdk_num_desc=4096
[dpdk_mac=08:c0:eb:97:8c:ef]
dpdk_lcore = 4
dpdk_ipv4 = 192.168.20.1/24
# dpdk_num_desc=4096
Also note that i have configured the dpdk_driver
.
I hope i can give you some hints, i also took some weeks to figure out the right setup.
@basti-schr I ended up figuring it out before I saw your reply. My changes were very similar to yours and now, outside of heavy drops for a few seconds while buffers settle everything is functional.
Thanks a lot for your hints! However, it's honestly disappointing how badly this is documented. The high bandwidth is advertised as one of the main feature but it's taken a lot of research to get anywhere near there :-)
It would be good if NI/Ettus could add this in the main x4xx documentation. I'll probably leave this open for this purpose.
I agree that bad documentation is a bug.
i want to know if install nic driver first(./mlnxofedinstall --dpdk), then install dpdk(sudo apt install dpdk dpdk-dev)? Or inverse?
Hello!
Lately I have been doing some tests and writing code with an USRP x410 I have access to for the moment (via https://twitter.com/SDR_Radio).
Getting it working after flashing the correct image was not too complicated, except some documentation was lacking (specifically, the MTU configuration is mentioned in the x3xx documentation but entirely missing on the x4xx side of things).
In receive, after flashing the CG_400 image and doing all the required network configuration, I am able to stream the full 471.52MSPS with no issues at all using SatDump. The
benchmark_rate
example also shows no drops, overflows etc in cf32 host format and sc16 wire format.However, when I got to attempting to transmit, nothing I was able to write was able to stream as the expected full rate of 471.52MSPS without major underruns. This occurs even with a simple while loop sending large buffers or similar, with UHD seemingly saturating that thread already. It's visible enough the transmit (red) LED on the USRP visibly blinks constantly. This behavior is also observed using the provided
benchmark_rate
example. (I have also attempted modifying various buffer sizes and network configuration to no avail.)RX Bench :
TX Bench :
The x410 is connected via QSFP+ (100GB) on Mellanox cards on a high-end workstation :
Mellanox Technologies MT27800 Family [ConnectX-5]
Intel i9-10980XE (36) @ 4.600GHz
I have tested raw streaming over the 100GB link and the machine is able to sustain far more than required to feed the x410 in this configuration. Considering the setup in use I would expect this to work.