geerlingguy / raspberry-pi-pcie-devices

Raspberry Pi PCI Express device compatibility database
http://pipci.jeffgeerling.com
GNU General Public License v3.0
1.56k stars 142 forks source link

Test Rosewill RC-20001 2.5GBASE-T PCIe x1 Network Adapter #40

Closed geerlingguy closed 2 years ago

geerlingguy commented 3 years ago

I just received a Rosewill RC-20001 2.5GBASE-T PCIe x1 Network Adapter. It should be able to get full speed out of the Pi.

It has the RTL8125B chip on it, and Realtek has a drivers page for the card here.

DSC_3183

The idea I have for this card is setting it up as a 2.5 GbE NAS, which will have enough bandwidth to (hopefully) copy 200+ MB/sec over the network. We'll see!

PixlRainbow commented 3 years ago

Interesting choice. 😅 The top amazon review was an Linux user who stated that drivers were no longer available for download.

EDIT you're in luck; community drivers were added to the mainline Linux git repo just a few months back, within this year. However, they appear to only be in the bleeding edge kernel versions for now.

geerlingguy commented 3 years ago
$ sudo lspci -vvvv
...
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8125 (rev 04)
    Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
    Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Interrupt: pin A routed to IRQ 255
    Region 0: I/O ports at <unassigned> [disabled]
    Region 2: Memory at 600000000 (64-bit, non-prefetchable) [disabled] [size=64K]
    Region 4: Memory at 600020000 (64-bit, non-prefetchable) [disabled] [size=16K]
    [virtual] Expansion ROM at 600010000 [disabled] [size=64K]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Address: 0000000000000000  Data: 0000
        Masking: 00000000  Pending: 00000000
    Capabilities: [70] Express (v2) Endpoint, MSI 01
        DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
        DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
        LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE#
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [b0] MSI-X: Enable- Count=32 Masked-
        Vector table: BAR=4 offset=00000000
        PBA: BAR=4 offset=00000800
    Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Input/output error
        Not readable
    Capabilities: [100 v2] Advanced Error Reporting
        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
    Capabilities: [148 v1] Virtual Channel
        Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:   ArbSelect=Fixed
        Status: InProgress-
        VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
            Status: NegoPending- InProgress-
    Capabilities: [168 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
    Capabilities: [178 v1] Transaction Processing Hints
        No steering table available
    Capabilities: [204 v1] Latency Tolerance Reporting
        Max snoop latency: 0ns
        Max no snoop latency: 0ns
    Capabilities: [20c v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
              PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
               T_CommonMode=0us LTR1.2_Threshold=0ns
        L1SubCtl2: T_PwrOn=10us
    Capabilities: [21c v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
geerlingguy commented 3 years ago

@PixlRainbow - I was specifically recommended this card by someone who is familiar with those drivers :)

I'll have to see how I get along!

geerlingguy commented 3 years ago
$ dmesg
...
[    1.010649] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
[    1.010686] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
[    1.010761] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
[    1.010833] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x00ffffffff -> 0x0100000000
[    1.027143] brcm-pcie fd500000.pcie: link up, 5 GT/s x1 (SSC)
[    1.027451] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
[    1.027482] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.027510] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
[    1.027581] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
[    1.027813] pci 0000:00:00.0: PME# supported from D0 D3hot
[    1.031443] pci 0000:00:00.0: bridge configuration invalid ([bus ff-ff]), reconfiguring
[    1.031656] pci 0000:01:00.0: [10ec:8125] type 00 class 0x020000
[    1.031757] pci 0000:01:00.0: reg 0x10: [io  0x0000-0x00ff]
[    1.031826] pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x0000ffff 64bit]
[    1.031878] pci 0000:01:00.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit]
[    1.031919] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref]
[    1.032166] pci 0000:01:00.0: supports D1 D2
[    1.032190] pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    1.035680] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.035734] pci 0000:00:00.0: BAR 8: assigned [mem 0x600000000-0x6000fffff]
[    1.035768] pci 0000:01:00.0: BAR 2: assigned [mem 0x600000000-0x60000ffff 64bit]
[    1.035820] pci 0000:01:00.0: BAR 6: assigned [mem 0x600010000-0x60001ffff pref]
[    1.035852] pci 0000:01:00.0: BAR 4: assigned [mem 0x600020000-0x600023fff 64bit]
[    1.035902] pci 0000:01:00.0: BAR 0: no space for [io  size 0x0100]
[    1.035927] pci 0000:01:00.0: BAR 0: failed to assign [io  size 0x0100]
[    1.035953] pci 0000:00:00.0: PCI bridge to [bus 01]
[    1.035984] pci 0000:00:00.0:   bridge window [mem 0x600000000-0x6000fffff]

Interesting, there was another message waaaay later:

[ 2791.685514] pci 0000:01:00.0: invalid short VPD tag 00 at offset 1
geerlingguy commented 3 years ago

Trying with the r8125-9.004.01.tar.bz2 driver download (requiring an email address and CAPTCHA to get) from their website on the chip page: https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software

tar vjxf r8125-9.004.01.tar.bz2
cd r8125-9.004.01/
sudo ./autorun.sh

On the Pi OS 64-bit ARM kernel, I got the error message:

Check old driver and unload it.
Build the module and install
make[2]: *** /lib/modules/5.4.51-v8+/build: No such file or directory.  Stop.
make[1]: *** [Makefile:167: clean] Error 2
make: *** [Makefile:48: clean] Error 2

Going to have to find a way to get the kernel headers... I have run into this three times now—installing raspberrypi-kernel-headers only gets you the 32-bit headers :-/

geerlingguy commented 3 years ago

To get headers for 64-bit Pi OS for now:

  1. Make sure everything's up to date with sudo apt-get dist-upgrade -y
  2. Copy out this gist, give it execute permissions, and run it with sudo ./script-here.sh
  3. Run: sudo dpkg -i /root/workdir/build/linux-image-5.10.3-v8+_arm64.deb (substitute the correct version here)
  4. Run: sudo ln -s /lib/modules/5.10.3-v8+/build /lib/modules/5.4.79-v8+/build (substitute the correct version here)

The script installs things like build-essentials that you may need for anything compiling against the kernel regardless, so it does pretty much everything for you. Note also that you may need to adjust the versions (e.g. 5.10.3 and 5.4.79) depending on what versions of the kernel you currently have installed and/or compiled.

Picked up from https://github.com/raspberrypi/Raspberry-Pi-OS-64bit/issues/4

(Edit: Alternatively, I could compile a new kernel in my cross-compile environment with the new driver in it...)

(Edit 2: Apparently there is a package available for 64-bit Pi OS now... sudo apt install raspberrypi-kernel-headers — note that you will still need to build your own version of the headers from source if you are building a custom kernel that's newer than the released version).

geerlingguy commented 3 years ago

Hmm... that method results in:

pi@raspberrypi:~/r8125-9.004.01 $ sudo ./autorun.sh
...
Warning: modules_install: missing 'System.map' file. Skipping depmod.
DEPMOD 5.4.79-v8+
load module r8125
modprobe: FATAL: Module r8125 not found in directory /lib/modules/5.4.79-v8+
Updating initramfs. Please wait.

I think I'm going to go the cross-compile route instead.

geerlingguy commented 3 years ago

Cross compiling with the following driver added:

Device Drivers
  > Network device support
    > Ethernet driver support
      > Realtek devices
        > Realtek 8169/8168/8101/8125 ethernet support
geerlingguy commented 3 years ago

All right, also completely revamped my process for the cross-compile. I can now do it all remote, to the running Pi. Easier than I thought it'd be... I'm going to document my experience in the cross-compile README in a few.

Anyways, now I rebooted and am getting:

[    4.066785] pci 0000:00:00.0: enabling device (0000 -> 0002)
[    4.066831] r8169 0000:01:00.0: enabling device (0000 -> 0002)
...
[    4.106184] r8169 0000:01:00.0 eth1: RTL8125B, 68:1c:a2:13:4e:ee, XID 641, IRQ 48
[    4.106211] r8169 0000:01:00.0 eth1: jumbo features [frames: 9194 bytes, tx checksumming: ko]
...
[    6.611592] r8169 0000:01:00.0: Direct firmware load for rtl_nic/rtl8125b-2.fw failed with error -2
[    6.611605] r8169 0000:01:00.0: Unable to load firmware rtl_nic/rtl8125b-2.fw (-2)
[    6.636079] RTL8226B_RTL8221B 2.5Gbps PHY r8169-100:00: attached PHY driver [RTL8226B_RTL8221B 2.5Gbps PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
[    6.760225] r8169 0000:01:00.0 eth1: Link is Down

Also checked sudo apt-get install firmware-realtek and it's already installed.

But checking the interface itself:

$ ip a
...
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 68:1c:a2:13:4e:ee brd ff:ff:ff:ff:ff:ff

Let's plug in a cable...

$ ip a
...
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 68:1c:a2:13:4e:ee brd ff:ff:ff:ff:ff:ff
    inet 10.0.100.182/24 brd 10.0.100.255 scope global dynamic noprefixroute eth1
       valid_lft 86400sec preferred_lft 75600sec
    inet6 fe80::e3a1:f9c6:4209:7cf9/64 scope link 
       valid_lft forever preferred_lft forever

Blinky lights are good. Now I need to get to setting up my 10G network so I can test a connection between my Mac and this card!

geerlingguy commented 3 years ago

Gigabit to my Mac:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver
geerlingguy commented 3 years ago

As they say in the country, yee-haw:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.23 GBytes  1.91 Gbits/sec    0             sender
[  5]   0.00-10.01  sec  2.22 GBytes  1.91 Gbits/sec                  receiver

Setup currently (it's a mess!):

IMG_3127

geerlingguy commented 3 years ago

Also, we're hitting the IRQ interrupt limits here; same as I did testing the Intel I340-T4 (in #3).

Screen Shot 2020-12-22 at 4 26 30 PM
geerlingguy commented 3 years ago

To set MTU to 9000:

First, apply the patch below to the linux source checkout with git apply -v name-of-file-with-contents-below.patch:

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 62051e353278..81e3da888d1a 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -52,7 +52,7 @@
 #define GENET_Q16_TX_BD_CNT    \
    (TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->tx_bds_per_q)

-#define RX_BUF_LENGTH      2048
+#define RX_BUF_LENGTH      10240
 #define SKB_ALIGNMENT      32

 /* Tx/Rx DMA register offset, skip 256 descriptors */
diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 41a518336673..28cac902cb77 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -22,8 +22,8 @@
 /*
  * According to 802.3ac, the packet can be 4 bytes longer. --Klika Jan
  */
-#define VLAN_ETH_DATA_LEN  1500    /* Max. octets in payload    */
-#define VLAN_ETH_FRAME_LEN 1518    /* Max. octets in frame sans FCS */
+#define VLAN_ETH_DATA_LEN  9000    /* Max. octets in payload    */
+#define VLAN_ETH_FRAME_LEN 9018    /* Max. octets in frame sans FCS */

 #define VLAN_MAX_DEPTH 8       /* Max. number of nested VLAN tags parsed */

diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index d6de2b167448..78a12dd0e542 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -33,8 +33,8 @@
 #define ETH_TLEN   2       /* Octets in ethernet type field */
 #define ETH_HLEN   14      /* Total octets in header.   */
 #define ETH_ZLEN   60      /* Min. octets in frame sans FCS */
-#define ETH_DATA_LEN   1500        /* Max. octets in payload    */
-#define ETH_FRAME_LEN  1514        /* Max. octets in frame sans FCS */
+#define ETH_DATA_LEN   9000        /* Max. octets in payload    */
+#define ETH_FRAME_LEN  9014        /* Max. octets in frame sans FCS */
 #define ETH_FCS_LEN    4       /* Octets in the FCS         */

 #define ETH_MIN_MTU    68      /* Min IPv4 MTU per RFC791  */

Second, recompile and push over the updated code.

Third, reboot.

geerlingguy commented 3 years ago

After reboot:

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 68:1c:a2:13:4e:ee brd ff:ff:ff:ff:ff:ff
    inet 10.0.100.182/24 brd 10.0.100.255 scope global dynamic noprefixroute eth1
       valid_lft 86333sec preferred_lft 75533sec
    inet6 fe80::e3a1:f9c6:4209:7cf9/64 scope link 
       valid_lft forever preferred_lft forever

And using iperf:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   332 KBytes   272 Kbits/sec    5             sender
[  5]   0.00-10.01  sec  0.00 Bytes  0.00 bits/sec                  receiver

Er... that can't be right!

geerlingguy commented 3 years ago

So... I could get:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.36 GBytes  2.03 Gbits/sec    0             sender
[  5]   0.00-10.01  sec  2.36 GBytes  2.02 Gbits/sec                  receiver

That's with the Pi side MTU set to 9000, and the Mac side at the default, 1500.

When I switch the hardware settings for my OWC TB3 10G port to use Jumbo frames in the macOS GUI, and wait for the interface to re-combobulate itself, it goes down to 0 bits/sec when I do an iperf3 run.

I mean, packets are flying, because I'm controlling this thing over the wired network. So what's up with iperf3??

geerlingguy commented 3 years ago

Seems like maybe the MikroTik switch/router needs adjusting.

Testing with ping 10.0.100.142 -s 1576, I was able to get a packet through. At a packet size of 1577 bytes or more, the pings never went through.

So I just realized the MikroTik switch defaults, I believe, to 'RouterOS', and I just want it to be a dumb switch, so I have to set my Mac to manual IP on the IP address 192.168.88.2 so I can connect to the router web UI at http://192.168.88.1. I did that, and I think I switched the thing into 'SwOS', but we'll see.

Edit: I did, yay! It's in 'Bridge' mode, and now the switch is getting an IP from my main router.

geerlingguy commented 3 years ago

THERE we go... had to log into the switch, edit each of the ports for my Mac and the Pi, and manually bump them from MTU 1500 and L2 MTU 1592 to MTU 9000 and L2 MTU 9000.

Both pings at 9000 bytes and iperf3 work great now:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.88 GBytes  2.48 Gbits/sec    0             sender
[  5]   0.00-10.01  sec  2.88 GBytes  2.47 Gbits/sec                  receiver

Hooray!

So to recap:

To get max 2.5 Gbps throughput on the Pi

  1. Make sure the Pi supports MTU 9000 (using diff + recompile above).
  2. Make sure the Mac (or other hardware) is set to MTU 9000.
  3. Make sure any switches/devices between the two support MTU 9000 (the MikroTik did not—it was set to 1500 by default, and had to be manually changed for the ports the Mac and Pi were connected to).

Also noting that if you switch from RouterOS to SwitchOS on the MikroTik, you don't have to change any MTU settings; the switch just does it all automatically at that point. The box is a little cooler, too. Neat feature: it has temperature sensors on each of the SFP+ ports, as well as one internally.

geerlingguy commented 3 years ago

The chip also doesn't get super crazy hot. It could sit in a passively cooled enclosure, I think. The MikroTik switch on the other hand...

IMG_0001

geerlingguy commented 3 years ago

And atop results with Jumbo Frames:

Screen Shot 2020-12-22 at 5 43 35 PM
mayli commented 3 years ago

Nice, how about duplex iperf3 testing both RX and TX. You can start two iperf3 servers on different port, and run two clients in parallel, one for RX and one for TX.

server: iperf3 -s & iperf3 -s -p 5202 client: iperf3 -c rpi & iperf3 -c rpi -p 5202 -R

geerlingguy commented 3 years ago

@mayli - Good idea! That'll have to wait, unfortunately—I accidentally released the magic smoke on the Realtek chip when testing it in an external powered riser...

IMG_3129

mayli commented 3 years ago

Rip $20 and let red shirt jeff fix it.

mi-hol commented 3 years ago

I accidentally released the magic smoke on the Realtek chip when testing it in an external powered riser...

A few comments, sounds like the label "currently testing" should be removed, or are you buying a new?

We all know errors may happen any time, but I wonder if some risers make it more likely than others to trigger such a human error (i.e. by design)?

geerlingguy commented 3 years ago

@mi-hol - That's my guess. These risers seem to be all over the board in terms of build quality, and none of them have an actual manufacturer attached... there also seems to be dozens of revisions with different features (like different power circuits), and I wonder if this particular one had some wires crossed.

The chip popped when I plugged in the CM4 to its 12v power supply, so I'm wondering if there was a surge through it's 12v PCIe rail that doubled the voltage into the card for some reason.

The 2.5 GbE adapter doesn't seem to have any extra power filtering like I see on most other PCIe boards... so that could also be something where the card is supposed to accept a wider range of voltages/wattage and it just got too much straight into the chip.

geerlingguy commented 3 years ago

From Fedor Suchkov on YouTube:

Have you tried UDP mode (-u with -b) and changing window size for tcp (-w) ?

geerlingguy commented 3 years ago

lol, just a couple days after I let out the magic smoke, AvE mentioned the importance of said smoke in his latest BOLTR video: https://youtu.be/x0UGErQDKPw?t=65

geerlingguy commented 3 years ago

NOW THERE ARE TWO! https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/46

(Also, I do have a replacement on hand since two days ago... haven't had the time to pick back up on testing it. There was this big Christmas holiday thing that required some of my attention :P)

geerlingguy commented 3 years ago

@mayli - I tested with --bidir to test TX/RX—this requires a newer version of iperf3 than what's available via Pi OS packages, so I compiled iperf3 from source:

# Compile iperf3 from source:
wget https://github.com/esnet/iperf/archive/master.zip
unzip master.zip
cd iperf-master/
./configure
make
cd src/
./iperf3 --help

Using that, it seems like the result still maxes out around 2.47 Gbps (with MTU 9000) TX, and ~70 Mbps RX.

[ ID][Role] Interval           Transfer     Bitrate
[  5][RX-S]   0.00-10.01  sec  2.88 GBytes  2.47 Gbits/sec                  receiver
[  9][TX-S]   0.00-10.01  sec  87.6 MBytes  73.4 Mbits/sec                  sender

Shouldn't that be a bit more symmetrical? I'm showing full duplex for the connections on both my Mac and the Pi. And using atop, I'm not seeing any contention on IRQ / CPU.

ethtool is showing it's connected as full duplex:

$ sudo ethtool eth1
Settings for eth1:
    Supported ports: [ TP MII ]
    Supported link modes:   10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
                            2500baseT/Full 
    Supported pause frame use: Symmetric Receive-only
    Supports auto-negotiation: Yes
    Supported FEC modes: Not reported
    Advertised link modes:  10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
                            2500baseT/Full 
    Advertised pause frame use: Symmetric Receive-only
    Advertised auto-negotiation: Yes
    Advertised FEC modes: Not reported
    Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                         100baseT/Half 100baseT/Full 
                                         1000baseT/Half 1000baseT/Full 
                                         10000baseT/Full 
                                         2500baseT/Full 
                                         5000baseT/Full 
    Link partner advertised pause frame use: No
    Link partner advertised auto-negotiation: Yes
    Link partner advertised FEC modes: Not reported
    Speed: 2500Mb/s
    Duplex: Full
    Port: MII
    PHYAD: 0
    Transceiver: internal
    Auto-negotiation: on
    Supports Wake-on: pumbg
    Wake-on: d
    Link detected: yes
geerlingguy commented 3 years ago

Responding to the YouTube comment about testing with -u for UDP, I ran:

$ ./iperf3 -c 10.0.100.143 -u -b 0
Connecting to host 10.0.100.143, port 5201
[  5] local 10.0.100.73 port 37468 connected to 10.0.100.143 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   292 MBytes  2.45 Gbits/sec  34270  
[  5]   1.00-2.00   sec   296 MBytes  2.48 Gbits/sec  34670  
[  5]   2.00-3.00   sec   296 MBytes  2.48 Gbits/sec  34660  
[  5]   3.00-4.00   sec   296 MBytes  2.48 Gbits/sec  34670  
[  5]   4.00-5.00   sec   296 MBytes  2.48 Gbits/sec  34670  
[  5]   5.00-6.00   sec   296 MBytes  2.48 Gbits/sec  34670  
[  5]   6.00-7.00   sec   296 MBytes  2.48 Gbits/sec  34660  
[  5]   7.00-8.00   sec   296 MBytes  2.48 Gbits/sec  34670  
[  5]   8.00-9.00   sec   296 MBytes  2.48 Gbits/sec  34670  
[  5]   9.00-10.00  sec   296 MBytes  2.48 Gbits/sec  34670  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  2.89 GBytes  2.48 Gbits/sec  0.000 ms  0/346280 (0%)  sender
[  5]   0.00-10.00  sec  2.89 GBytes  2.48 Gbits/sec  0.029 ms  0/346275 (0%)  receiver

iperf Done.

Bidirectional was a bit more wonky:

[ ID][Role] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5][TX-C]   0.00-10.00  sec  2.27 GBytes  1.95 Gbits/sec  0.000 ms  0/272880 (0%)  sender
[  5][TX-C]   0.00-10.00  sec  1.48 GBytes  1.27 Gbits/sec  0.038 ms  95040/272743 (35%)  receiver
[  7][RX-C]   0.00-10.00  sec  13.6 GBytes  11.7 Gbits/sec  0.000 ms  0/1634817 (0%)  sender
[  7][RX-C]   0.00-10.00  sec   421 MBytes   353 Mbits/sec  0.412 ms  1582302/1631682 (97%)  receiver

If I limited the bitrate (-b 100M), it would stop losing packets. It seems like somewhere between a 100M and 1000M datarate it starts losing a lot of packets over UDP in bidirectional mode.

geerlingguy commented 3 years ago

A blog post inspired by this issue: Setting 9000 MTU (Jumbo Frames) on Raspberry Pi OS

geerlingguy commented 3 years ago

An interesting finding in the Pi Forums too (which explains why some of my iperf3 runs seem to have 30% worse performance):

when i was checking my pi4 a few days ago, i noticed 2 interesting facts

1: tx is far cheaper on cpu, while rx takes a lot of processing power 2: the genet rx irq is pinned to core0, and iperf3 pins itself to a random core if iperf3 is pinned to core0, the 2 fight over the cpu some, and you loose about 20-30% of your bandwidth but if iperf3 is pinned to any other core, then you nearly max out both cores, and get better receive performance

you can manually pin iperf3 to any core you want, then you can confirm the difference between each core and make it more predictable

Source: https://www.raspberrypi.org/forums/viewtopic.php?p=1736824#p1736824

geerlingguy commented 3 years ago

Once I have a video and blog post up, I should add links to them from here as well as the page for this card on the site.

geerlingguy commented 3 years ago

Blog post: https://www.jeffgeerling.com/blog/2020/testing-25-gbps-ethernet-on-raspberry-pi-cm4

Video: https://www.youtube.com/watch?v=wCbQQ5-sjGM

stevefan1999-personal commented 3 years ago

Have you tried to manually tweak the IRQ balance over to "round-robin" interrupts over different cores? This is a pain in the arse to do so however.

geerlingguy commented 3 years ago

@stevefan1999-personal - Yes, check out the efforts in https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/3 — it seems like it's impossible to do on the Pi.

johntdavis84 commented 3 years ago

Jeff,

Thank you for all this, as well as the article on your website and the details on the Raspberry Pi forum. It was very helpful--especially to know what results I should expect at various settings.

I'm using the Sabrent version of the USB 3 adapter carrying this chipset, and am getting a pretty solid 1.8 Gbps via iperf3. 1) Overclock: 2.0 GHz 2) Jumbo Frames: 9000 @ Pi, switch, Mac (iPerf server).

I was actually getting the same speed results at MTU 1500, and did not see any improvement when I was at MTU 1500 with the CPU overclocked to 2.147 GHz. I did pull back from that and return to 2.0 GHz, as the system seemed more unstable at 2.147 Ghz.

I'm still running with Jumbo Frames, though, as I did see a drastic drop in CPU usage. At MTU 1500, iperf3 would peg one core to 100 percent and take a good chunk out of another one. With MTU 9000, iPerf uses 40-60 percent on core 1, and 30-50 percent on core 2.

I don't have atop installed, but I'm guessing higher MTU is preventing a CPU bottleneck, which seems to be worth it. (I have no idea. I've never touched MTU in my life until today.)

EDIT: Another issue with the USB 3.0 adapter on the Pi is that the default USB autosuspend behavior drops the connection to the adapter from time to time.

I think I've got that solved via Udev rule, but I'm not terribly familiar with those, and won't know if it worked until I find out whether my Mac loses SSH connectivity to the Pi overnight and I see the reset message in dmesg.

This is what I'm trying:

rules.d]$ cat 15-power-custom.rules ACTION=="add|change", SUBSYSTEM=="usb", ATTR{idVendor}=="0bda", ATTR{idProduct}=="8156", TEST=="power/control", ATTR{power/control}="on"

It looks like it's not working, but autosuspend is set to 0, so I have no idea.

image
geerlingguy commented 2 years ago

I think I can close this off as it's working as expected :)

CeeJayDK commented 2 years ago

I just bought a Raidsonic ICY BOX IB-LAN300-PCI 2.5 Gigabit Ethernet PCIe card and it looks identical to this card except for a different QC sticker. Everything else is exactly the same.

So a rebrand of this card.