EttusResearch / uhd

The USRP™ Hardware Driver Repository
http://uhd.ettus.com
Other
1.01k stars 669 forks source link

Nvidia NX cannot receive high sample rates with B205-mini-i #672

Closed curtesian closed 5 months ago

curtesian commented 1 year ago

Issue Description

I would like to receive samples at the max sample rate of the B205-mini-i (56 MHz) on an Nvidia NX device. However, when running the benchmark_rate example, I get a lot of overflows/dropped samples. In order to not have any dropped samples, I need to reduce the rate to about 10 MHz, which is much lower than what the B205-mini-i is rated for. I did not encounter any dropped samples when running the same benchmark_rate test on a desktop PC.

Setup Details

UHD/FPGA Version: UHD_4.1.0.6-120-g7e1ff96a OS: Ubuntu 20.04 on Nvidia Jetpack 5.0.2 (Linux Kernel 5.10) C++ Version: 9.4.0 Hardware: B205-mini-i, Jetson Xavier NX 16GB RAM, Seeed A203v2 carrier board Connection: USB 3.0

Expected Behavior

I expect there to be no dropped samples at 56 MHz to replicate what the B205-mini-i is spec'd for and the performance on a desktop PC.

Actual Behaviour

uas@ubuntu:/usr/local/lib/uhd/examples$ ./benchmark_rate --rx_rate 56e6

[INFO] [UHD] linux; GNU C++ version 9.4.0; Boost_107100; UHD_4.1.0.6-120-g7e1ff96a
[INFO] [B200] Loading firmware image: /usr/local/share/uhd/images/usrp_b200_fw.hex...
[00:00:00.022239] Creating the usrp device with: ...
[INFO] [B200] Detected Device: B205mini
[INFO] [B200] Loading FPGA image: /usr/local/share/uhd/images/usrp_b205mini_fpga.bin...
[INFO] [B200] Operating over USB 3.
[INFO] [B200] Initialize CODEC control...
[INFO] [B200] Initialize Radio control...
[INFO] [B200] Performing register loopback test... 
[INFO] [B200] Register loopback test passed
[INFO] [B200] Setting master clock rate selection to 'automatic'.
[INFO] [B200] Asking for clock rate 16.000000 MHz... 
[INFO] [B200] Actually got clock rate 16.000000 MHz.
Using Device: Single USRP:
  Device: B-Series Device
  Mboard 0: B205mini
  RX Channel: 0
    RX DSP: 0
    RX Dboard: A
    RX Subdev: FE-RX1
  TX Channel: 0
    TX DSP: 0
    TX Dboard: A
    TX Subdev: FE-TX1

[00:00:10.980245463] Setting device timestamp to 0...
[INFO] [B200] Asking for clock rate 56.000000 MHz... 
[INFO] [B200] Actually got clock rate 56.000000 MHz.
[00:00:11.343532337] Testing receive rate 56.000000 Msps on 1 channels
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO[00:00:21.394920020] Benchmark complete.

Benchmark rate summary:
  Num received samples:     537798537
  Num dropped samples:      24995184
  Num overruns detected:    172
  Num transmitted samples:  0
  Num sequence errors (Tx): 0
  Num sequence errors (Rx): 0
  Num underruns detected:   0
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        0

Done!

Steps to reproduce the problem

In the command line for the Nvidia NX, navigating to the examples directory of the built uhd library: ./benchmark_rate --rx_rate 56e6

Additional Information

I found a discussion from 2018 in the gnuradio archive, where the author was using an ARM processor with an E310 and failing to get a high sample rate.

However in my case, the Nvidia NX is a pretty beefy ARM processor, and I would expect better performance from an Nvidia NX than a regular old Raspberry Pi, Odroid, etc.

In fact, a company called Deepwave Digital advertises an SDR solution that uses an Nvidia TX2-i (less powerful than an NX) to achieve 100 MHz of bandwidth over 2 channels and 125 Msps per channel. They use a completely different SoapySDR solution (and I am not really interested in using their products at the moment), but it shows that an edge device like the TX2 or Nvidia NX is capable of handling a high sample rate.

Are there any workarounds to get an ARM processor like the Nvidia NX to work with a UHD USRP device at a high sample rate?

Thank you! Curtis

curtesian commented 1 year ago

Here are some things I've done over the last few days to ensure that the Nvidia NX can handle USB 3.0 speeds:

  1. Increased USBFS buffer size. The default buffer size for USB 3.0 on Jetson systems is 16 MB. For high resolution cameras (and potentially high rate USRP devices), this should be increased according to this guide from Allied Vision. I set the kernel parameter usbcore.usbfs_memory_mb=1024 and experimented with even higher buffer sizes (2048 and 4096), although the higher buffer sizes had no effect above 1024 MB.
  2. Turn off USB autosuspend. Jetson systems suspends inactive USB ports when not in use. Although this shouldn't be a problem, I wanted to turn off this feature in case the NX randomly suspended the USB port while the USRP receives samples. Following this JetsonHacks guide, I turned off autosuspend by adding usbcore.autosuspend=-1 to the end of the APPEND line in /boot/extlinux/extlinux.conf and rebooted the system.

With these two changes, I can get 0 dropped samples consistently at 20 Msps, and can push up to 30 Msps with 1-15 overruns. Anything above 30 Msps and I get large amounts of overflows identical to the original behavior at 56 Msps. I think increasing the USB buffer size may have helped a little, but it did not solve the problem.

Any further advice or guidance would be greatly appreciated. I would really like to receive samples consistently at sample rates >= 50 Msps on this B205 device.

GillesC commented 1 year ago

Hi @cmanore25, Just wanted to add that with a Raspberry Pi 4 the benchmark at 56Msps results in no errors with a B210:

./benchmark_rate --rx_rate 56e6

[INFO] [UHD] linux; GNU C++ version 10.2.1 20210110; Boost_107400; UHD_4.1.0.6-4-g462d11b1
[00:00:00.044815] Creating the usrp device with: ...
[INFO] [B200] Detected Device hello 6: B210
[INFO] [B200] Operating over USB 3.
[INFO] [B200] Initialize CODEC control...
[INFO] [B200] Initialize Radio control...
[INFO] [B200] Performing register loopback test...
[INFO] [B200] Register loopback test passed
[INFO] [B200] Performing register loopback test...
[INFO] [B200] Register loopback test passed
[INFO] [B200] Setting master clock rate selection to 'automatic'.
[INFO] [B200] Asking for clock rate 16.000000 MHz...
[INFO] [B200] Actually got clock rate 16.000000 MHz.
Using Device: Single USRP:
  Device: B-Series Device
  Mboard 0: B210
  RX Channel: 0
    RX DSP: 0
    RX Dboard: A
    RX Subdev: FE-RX2
  RX Channel: 1
    RX DSP: 1
    RX Dboard: A
    RX Subdev: FE-RX1
  TX Channel: 0
    TX DSP: 0
    TX Dboard: A
    TX Subdev: FE-TX2
  TX Channel: 1
    TX DSP: 1
    TX Dboard: A
    TX Subdev: FE-TX1

[00:00:04.243729368] Setting device timestamp to 0...
[INFO] [B200] Asking for clock rate 56.000000 MHz...
[INFO] [B200] Actually got clock rate 56.000000 MHz.
[00:00:04.627604459] Testing receive rate 56.000000 Msps on 1 channels
[00:00:14.678405444] Benchmark complete.

Benchmark rate summary:
  Num received samples:     562792569
  Num dropped samples:      0
  Num overruns detected:    0
  Num transmitted samples:  0
  Num sequence errors (Tx): 0
  Num sequence errors (Rx): 0
  Num underruns detected:   0
  Num late commands:        0
  Num timeouts (Tx):        0
  Num timeouts (Rx):        0

Done!

Having both a similar CPU (ARM v8), it at least works on an ARM for me. I hope this can already help you in determining/pointing in the direction of the cause of the drops. Let me know if I need to do some additional testing with a RPI 4.

wordimont commented 1 year ago

Is it possible the bus is actually operating at USB 2.0 speed with the Nvidia NX device?

curtesian commented 1 year ago

Here is the bus info when running lsusb -t

$ lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=tegra-xusb/4p, 10000M
    |__ Port 3: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
        |__ Port 4: Dev 4, If 0, Class=Vendor Specific Class, Driver=, 5000M
        |__ Port 4: Dev 4, If 1, Class=Vendor Specific Class, Driver=, 5000M
        |__ Port 4: Dev 4, If 2, Class=Vendor Specific Class, Driver=, 5000M
        |__ Port 4: Dev 4, If 3, Class=Vendor Specific Class, Driver=, 5000M
        |__ Port 4: Dev 4, If 4, Class=Vendor Specific Class, Driver=, 5000M

I confirmed that Bus 02 is the right port that the B205-mini-i is plugged in to by comparing the output of lsusb when the device is plugged in vs unplugged.

It looks like the bus is operating at USB 3.1 Gen 2 speeds (10000M), with the device itself operating at USB 3.0 speed (5000M). @GillesC if you don't mind, could you comment your lsusb -t output for the B210 plugged into the Raspberry Pi? I am curious how the B210 device appears on the bus compared to what the Nvidia NX is showing.

I had some email coordination with Ettus support, and tried these recommendations:

  1. Setting CPU Governor to performance. This was done by following the USRP Performance Tips and Tricks Guide.
  2. Adding the arguments --rx_otw=sc16 --rx_cpu=16 to the benchmark_rate test to skip the complex short to complex float conversion.
  3. Adding the arguments arg="recv_frame_size=10000,num_recv_frames=128" to the benchmark_rate test as well.

While these helped get the rate up to 30 MHz without dropping samples, it is still dropping samples at 56 MHz.

michaelld commented 1 year ago

Just to mention for completion: Some USB controllers do not provide enough power for the B20 to work reliably. The B200 / B210 take an optional power supply that removes this as an issue; not so for the B20 minis, unfortunately.

GillesC commented 1 year ago

Output of lsusb -t:

/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
    |__ Port 2: Dev 3, If 4, Class=Vendor Specific Class, Driver=, 5000M
    |__ Port 2: Dev 3, If 2, Class=Vendor Specific Class, Driver=, 5000M
    |__ Port 2: Dev 3, If 0, Class=Vendor Specific Class, Driver=, 5000M
    |__ Port 2: Dev 3, If 3, Class=Vendor Specific Class, Driver=, 5000M
    |__ Port 2: Dev 3, If 1, Class=Vendor Specific Class, Driver=, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M

with Bus 002 Device 003: ID 2500:0020 Ettus Research LLC USRP B210 the USRP.

I am not at the office, I'll check tomorrow if the USRPs are externally powered or not.

EDIT: They were not externally powered.

mbr0wn commented 5 months ago

This looks like too specific to the Nvidia NX.