Closed geerlingguy closed 3 years ago
Since you have two, you should try to see if RoCE can work. It would be awesome to have an RDMA-enabled Pi and it would help removing the IRQ issue you had in the other videos. If all of this work then I'll be pleased to see some infiniband network on it :)
@albydnc - Heh, I'll see what I can do!
@geerlingguy let me know if you need some help, I work on infiniband and rdma
$ sudo lspci -vvvv
01:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
Subsystem: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 0
Region 0: Memory at 600800000 (64-bit, non-prefetchable) [disabled] [size=1M]
Region 2: Memory at 600000000 (64-bit, prefetchable) [disabled] [size=8M]
[virtual] Expansion ROM at 600900000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Vital Product Data
Product Name: ConnectX-2 SFP+
Read-only fields:
[PN] Part number: MNPA19-XTR
[EC] Engineering changes: A2
[SN] Serial number: MT1148X12321
[V0] Vendor specific: PCIe Gen2 x8
[RV] Reserved: checksum good, 0 byte(s) reserved
Read/write fields:
[V1] Vendor specific: N/A
[YA] Asset tag: N/A
[RW] Read-write area: 105 byte(s) free
End
Capabilities: [9c] MSI-X: Enable- Count=128 Masked-
Vector table: BAR=0 offset=0007c000
PBA: BAR=0 offset=0007d000
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #8, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [148 v1] Device Serial Number 00-02-c9-03-00-53-00-fa
$ dmesg
...
[ 1.217953] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
[ 1.217995] brcm-pcie fd500000.pcie: No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
[ 1.218079] brcm-pcie fd500000.pcie: MEM 0x0600000000..0x063fffffff -> 0x00c0000000
[ 1.218178] brcm-pcie fd500000.pcie: IB MEM 0x0000000000..0x00ffffffff -> 0x0100000000
[ 1.282343] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
[ 1.282710] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
[ 1.282742] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 1.282770] pci_bus 0000:00: root bus resource [mem 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff])
[ 1.282866] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
[ 1.283113] pci 0000:00:00.0: PME# supported from D0 D3hot
[ 1.286752] pci 0000:00:00.0: bridge configuration invalid ([bus ff-ff]), reconfiguring
[ 1.400521] pci 0000:01:00.0: [15b3:6750] type 00 class 0x020000
[ 1.400803] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x000fffff 64bit]
[ 1.400979] pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x007fffff 64bit pref]
[ 1.401254] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref]
[ 1.402247] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[ 1.405844] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[ 1.405903] pci 0000:00:00.0: BAR 9: assigned [mem 0x600000000-0x6007fffff 64bit pref]
[ 1.405933] pci 0000:00:00.0: BAR 8: assigned [mem 0x600800000-0x6009fffff]
[ 1.405965] pci 0000:01:00.0: BAR 2: assigned [mem 0x600000000-0x6007fffff 64bit pref]
[ 1.406122] pci 0000:01:00.0: BAR 0: assigned [mem 0x600800000-0x6008fffff 64bit]
[ 1.406275] pci 0000:01:00.0: BAR 6: assigned [mem 0x600900000-0x6009fffff pref]
[ 1.406303] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 1.406334] pci 0000:00:00.0: bridge window [mem 0x600800000-0x6009fffff]
[ 1.406363] pci 0000:00:00.0: bridge window [mem 0x600000000-0x6007fffff 64bit pref]
Trying this driver first: https://www.mellanox.com/products/ethernet-drivers/linux/mlnx_en
$ wget http://www.mellanox.com/downloads/ofed/MLNX_EN-5.1-1.0.4.0/mlnx-en-5.1-1.0.4.0-debian10.3-aarch64.tgz
$ tar xvf mlnx-en-5.1-1.0.4.0-debian10.3-aarch64.tgz
$ cd mlnx-en-5.1-1.0.4.0-debian10.3-aarch64/
$ sudo ./install
Error: The current mlnx-en is intended for debian10.3
How unfortunate :P
Digging through the installer, I found --skip-distro-check
as an available option.
$ sudo ./install --skip-distro-check
System has one or more unsupported device, see below.
MLNX_OFED / mlnx_en 5.1 and above supports only ConnectX-4 or newer devices.
This device could become unavailable which might result in loss of connectivity.
Use --skip-unsupported-devices-check to skip this check.
Aborting.
* 01:00.0 Ethernet controller [0200]: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] [15b3:6750] (rev b0)
No support for older cards? What madness! Let's try with:
$ sudo ./install --skip-distro-check --skip-unsupported-devices-check
Now it's attempting to install extra stuff:
Checking SW Requirements...
One or more required packages for installing mlnx-en are missing.
/lib/modules/5.10.3-v8+/build/scripts is required for the Installation.
Attempting to install the following missing packages:
autotools-dev graphviz autoconf chrpath linux-headers-5.10.3-v8+ dpatch lsof dkms m4 automake quilt debhelper swig libltdl-dev
Failed command: apt-get install -y autotools-dev graphviz autoconf chrpath linux-headers-5.10.3-v8+ dpatch lsof dkms m4 automake quilt debhelper swig libltdl-dev
Why don't all these device manufacturers account for Raspberry Pi OS 64-bit beta? 🤔
Anyways, going to stop for now, and get back at it later. At least I have the card identified. It works through my 1x-to-16x adapter, but it wasn't showing if I tried powering it through my external adapter...
I also received a Noctua fan PWM controller today. Nice for my ears to not run the 12V fan at maximum speed all day :D
@albydnc - I may ask for some help figuring out a good test / benchmark for RDMA, as I know a lot of people may be interested in whether the Pi can support it.
@geerlingguy you can use the default benchmarks available with the mellanox driver: perfest. You can also look at the source on GitHub. This is the optimal condition for testing the performance of the network, since the benchmarks are written using the C low-level API for RDMA (infinband verbs). For a more general test, I suggest to use MPI tests, so you can compare easily various technologies; you will see a drop in performance, but it shouldn't be significant.
Trying this driver first: https://www.mellanox.com/products/ethernet-drivers/linux/mlnx_en
$ wget http://www.mellanox.com/downloads/ofed/MLNX_EN-5.1-1.0.4.0/mlnx-en-5.1-1.0.4.0-debian10.3-aarch64.tgz $ tar xvf mlnx-en-5.1-1.0.4.0-debian10.3-aarch64.tgz $ cd mlnx-en-5.1-1.0.4.0-debian10.3-aarch64/ $ sudo ./install Error: The current mlnx-en is intended for debian10.3
How unfortunate :P
you lost me here :( Why wouldn't the available debian10.0 driver have no chance of working?
@mi-hol - It seems like that install script is a giant bash script that has a lot of points of entanglement where it's looking for exact strings in returned information. Pi OS, and especially Pi OS 64-bit beta, don't behave identically to Debian 10.3 / Debian 10.
The Ubuntu installer might have better success, but honestly, the drivers have a ton of warnings and checks and things that try to force you to use ConnectX-4 or later generation of cards... I'm thinking compiling in the kernel would be easier since it's not as preachy about making you buy the latest generation of card.
So, Connectx2, while still interesting, are not something you'll want to waste your time on. Mellanox has dropped driver support from ages and they miss the only interesting thing of Mellanox NICs, RDMA. You should be able to get one Connectx3 on eBay for cheap and get all the nice modern features. If you want to try it, I'm willing to buy it for you @geerlingguy
@albydnc - I figured as much... and I would gladly take you up on that offer! If you can DM me on Twitter, or email me (my email is on my website about page), I can sort out the details. And I'll happily plug your Twitter/name/whatever in an eventual video I make on 10G networking on the Pi (whether or not I can get the X3 working! I already have the ASUS card going).
Jeff, just to (re-)pique your interest in the Mellanox cards, I have the dual NIC versions of the same venerable beasties:
$lspci -nn | grep Mellanox
01:00.0 Ethernet controller [0200]: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] [15b3:6750] (rev b0)
where CentOS 7 worked a treat but CentOS Stream dropped support, I discovered, when upgrading a couple of weeks ago. You should note that Linux, at least, uses the MLX4 driver for these parts.
In my case the issue was "as simple as" the drivers having GEN2 support #define
'd out. I wrote some notes to self.
On my RPi4 running 64-bit there's barely support for any ethernet device:
$ls /lib/modules/$(uname -r)/kernel/drivers/net/ethernet/
microchip qualcomm wiznet
but there are some 4.x kernels lying about (no idea why) which do have full MLX4 support. In particular you can grep out this particular card (using the PCI vendor and product IDs from lspci -nn
above):
$modinfo /lib/modules/4.19.0-16-arm64/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko | grep -i 15b3 | grep -i 6750
alias: pci:v000015B3d00006750sv*sd*bc*sc*i*
So I'm going to guess that support is entirely feasible.
Until last week I'd not compiled anything kernel-y before but I guess the process is similar on the RPi.
@ianfitchet - Thanks! I'll keep that in mind next time I get back to this card—for now I'm switching my sights over to the ConnectX-3 I just got (see #143).
Just tried with same freshly-compiled kernel I tested in #143 with a ConnectX-3 adapter, and getting the exact same error:
[ 28.219483] mlx4_en: eth1: Link Up
[ 43.997574] ------------[ cut here ]------------
[ 43.997620] NETDEV WATCHDOG: eth1 (mlx4_core): transmit queue 0 timed out
[ 43.997703] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x3a0/0x3a8
[ 43.997710] Modules linked in: bnep hci_uart btbcm bluetooth ecdh_generic ecc mlx4_en 8021q garp stp llc vc4 brcmfmac cec brcmutil drm_kms_helper v3d cfg80211 gpu_sched bcm2835_codec(C) rfkill bcm2835_v4l2(C) drm bcm2835_isp(C) v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common drm_panel_orientation_quirks raspberrypi_hwmon videodev mlx4_core vc_sm_cma(C) mc snd_bcm2835(C) i2c_brcmstb snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd syscopyarea rpivid_mem sysfillrect sysimgblt fb_sys_fops backlight uio_pdrv_genirq uio nvmem_rmem aes_neon_bs sha256_generic aes_neon_blk crypto_simd cryptd ip_tables x_tables ipv6
[ 43.998043] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G C 5.10.39-v8+ #1
[ 43.998050] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 43.998062] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[ 43.998071] pc : dev_watchdog+0x3a0/0x3a8
[ 43.998078] lr : dev_watchdog+0x3a0/0x3a8
[ 43.998085] sp : ffffffc0115bbd10
[ 43.998092] x29: ffffffc0115bbd10 x28: ffffff804b0d3f40
[ 43.998108] x27: 0000000000000004 x26: 0000000000000140
[ 43.998124] x25: 00000000ffffffff x24: 0000000000000002
[ 43.998139] x23: ffffffc011286000 x22: ffffff804b0a03dc
[ 43.998154] x21: ffffff804b0a0000 x20: ffffff804b0a0480
[ 43.998168] x19: 0000000000000000 x18: 0000000000000000
[ 43.998183] x17: 0000000000000000 x16: 0000000000000000
[ 43.998198] x15: ffffffffffffffff x14: ffffffc011288948
[ 43.998213] x13: ffffffc01146ebd0 x12: ffffffc011315430
[ 43.998227] x11: 0000000000000003 x10: ffffffc0112fd3f0
[ 43.998242] x9 : ffffffc0100e5358 x8 : 0000000000017fe8
[ 43.998256] x7 : c0000000ffffefff x6 : 0000000000000003
[ 43.998270] x5 : 0000000000000000 x4 : 0000000000000000
[ 43.998285] x3 : 0000000000000103 x2 : 0000000000000102
[ 43.998299] x1 : 730045c0bcfb7500 x0 : 0000000000000000
[ 43.998314] Call trace:
[ 43.998324] dev_watchdog+0x3a0/0x3a8
[ 43.998339] call_timer_fn+0x38/0x200
[ 43.998349] run_timer_softirq+0x298/0x548
[ 43.998358] __do_softirq+0x1a8/0x510
[ 43.998369] irq_exit+0xe8/0x108
[ 43.998378] __handle_domain_irq+0xa0/0x110
[ 43.998386] gic_handle_irq+0xb0/0xf0
[ 43.998393] el1_irq+0xc8/0x180
[ 43.998407] arch_cpu_idle+0x18/0x28
[ 43.998416] default_idle_call+0x58/0x1d4
[ 43.998427] do_idle+0x25c/0x270
[ 43.998437] cpu_startup_entry+0x30/0x70
[ 43.998448] secondary_start_kernel+0x170/0x180
[ 43.998456] ---[ end trace 257c7cb4ef196f12 ]---
[ 43.998490] mlx4_en: eth1: TX timeout on queue: 0, QP: 0x208, CQ: 0x84, Cons: 0xffffffff, Prod: 0x1
[ 44.046185] mlx4_en: eth1: Steering Mode 1
[ 44.052169] mlx4_en: eth1: Link Down
[ 46.301966] mlx4_en: eth1: Link Up
[ 61.917527] mlx4_en: eth1: TX timeout on queue: 2, QP: 0x20a, CQ: 0x86, Cons: 0xffffffff, Prod: 0x1
[ 61.949949] mlx4_en: eth1: Steering Mode 1
[ 61.970419] mlx4_en: eth1: Link Down
[ 64.379433] mlx4_en: eth1: Link Up
The lights flash, things seem to work, but it keeps re-connecting :(
$ ip a
...
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:02:c9:4e:e2:fa brd ff:ff:ff:ff:ff:ff
inet 169.254.135.78/16 brd 169.254.255.255 scope global noprefixroute eth1
valid_lft forever preferred_lft forever
inet6 fe80::25c8:7bfd:2254:dad4/64 scope link
valid_lft forever preferred_lft forever
Marking this as done... can't find any way to get the thing working, unfortunately.
(I have since confirmed these cards work fine in a few different PCs, though.)
If you still have one of these cards around, turning off tx/fx flow control (pause frames) may work via ethtool -A DEVICE rx off tx off
, which I've had to do for connectx2 cards on some PC installations that see a similar timeout & linkdown/up behavior. The card's also designed to work with multiple tx/rx queues to split traffic amongst CPUs. Maybe that's interfering with something as well and you could try using ethtool -l DEVICE
and ethtool -L DEVICE rx RXCHNUM tx TXCHNUM
to tweak the channel count.
Incidentally, I've also had these cards silently fail when I try to use a MTU larger than 4032 on ethernet, but IDK if you've done anything IRT that.
I still can use the Mellanox MT26448 with the latest fedora (39) without any issues in 2024. On the other hand, other Rocky Linux is not compatible with this NIC card. Strange.
@kadir-gunel Maybe Rocky Linux has inherited the problem mentioned above? https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/21#issuecomment-842959112
Thanks to Jacob Hiltz, I have two Mellanox ConnectX-2 EN MNPA19-XTR SFP+ 10G Ethernet Adapter cards pulled from Dell servers to test with the CM4. He also sent me two Cisco SFP-H10GB-CU3M Passive Twinax cables, which means all I need to do is drop that card into my MikroTik 10G 4-port switch, plug in my MacBook Pro via my OWC 10G ThunderBolt 3 adapter, and see what happens.
Mellanox links:
drivers/net/ethernet/mellanox/mlx5
).Related: