acooks / tn40xx-driver

Linux driver for tn40xx from Tehuti Networks
71 stars 50 forks source link

StarTech ST10GSPEXNDP dual port naming support. #28

Open zenczykowski opened 4 years ago

zenczykowski commented 4 years ago

This is on a Debian-derived 5.2.17-1rodete3-amd64 system.

[161345.272682] Tehuti Network Driver, 0.3.6.17.2 [161345.272687] Supported phys : MV88X3120 MV88X3310 QT2025 TLK10232 AQR105 MUSTANG [161345.272827] tn40xx 0000:05:00.0: enabling device (0100 -> 0102) [161345.273004] srom 0x0 HWver 16 build 0 lane# 4 max_pl 0x1 mrrs 0x3 [161345.512548] PHY detected on port 0 ID=2B09AB - MV88X3310 (A1) 10Gbps 10GBase-T [161349.016591] MV88X3310 initdata applied [161349.016681] MV88X3310 I/D version is 0.3.4.0 [161349.136631] tn40xx 0000:05:00.0 ens6: renamed from eth1 <-- should be ens6f0 [161349.217708] fw 0xe [161349.217717] ens6, Port A [161349.217864] 1 1fc9:4027:1fc9:3015 [161349.217865] 2 1fc9:4027:1fc9:3015 [161349.218018] detected 2 cards, 1 loaded [161349.218157] tn40xx 0000:06:00.0: enabling device (0100 -> 0102) [161349.218269] srom 0x0 HWver 16 build 0 lane# 4 max_pl 0x1 mrrs 0x3 [161349.452670] PHY detected on port 0 ID=2B09AB - MV88X3310 (A1) 10Gbps 10GBase-T [161352.728538] MV88X3310 initdata applied [161352.728625] MV88X3310 I/D version is 0.3.4.0 [161352.929688] fw 0xe [161352.929697] eth1, Port A [161352.929838] 1 1fc9:4027:1fc9:3015 [161352.929839] 2 1fc9:4027:1fc9:3015 [161352.929992] detected 2 cards, 2 loaded

Here's how a StarTech ST10GPEXNDPI nic (this is a 2-port PCIe 10GBase-T / NBASE-T Ethernet Network Card - with Intel X550 Chip) shows up: 4: ens4f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 5: ens4f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000

Here's how the above 7: ens6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 3000 8: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 3000

Basically this should have been ens6f0 (not ens6) and ens6f1 (not eth1).

I'm not entirely sure what causes the issue... is this a driver problem? An incorrect uevent issue? Missing udev config?

Obviously the intel (ixgbe) nic gets it right...

acooks commented 4 years ago

Looks like it shows up as two distinct PCI devices, namely 0000:05:00.0 and 0000:06:00.0, as opposed to the Intel x550 case where there are two functions on the same device, eg. 0000:05:00.0 and 0000:05:00.1 (if it was plugged into the same slot). You can check the output of lspci to confirm.

Maybe the userspace application that renames the devices doesn't know what to do, possibly because the hardware topology doesn't match the implied rules of the naming convention.

Could you have a look at https://wiki.debian.org/NetworkInterfaceNames and help us understand what your system is configured to do?

zenczykowski commented 4 years ago
$ lspci -nn | egrep -i Ether
00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I218-LM [8086:15a0] (rev 05)
05:00.0 Ethernet controller [0200]: Tehuti Networks Ltd. TN9710P 10GBase-T/NBASE-T Ethernet Adapter [1fc9:4027]
06:00.0 Ethernet controller [0200]: Tehuti Networks Ltd. TN9710P 10GBase-T/NBASE-T Ethernet Adapter [1fc9:4027]
08:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
84:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller 10G X550T [8086:1563] (rev 01)
84:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller 10G X550T [8086:1563] (rev 01)
zenczykowski commented 4 years ago

I think the local config is:

$ cat /etc/udev/rules.d/70-persistent-net.rules | egrep -v '^#|^ *$'
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="a0:....", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="a0:....", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eno1", ENV{NM_UNMANAGED}="1"
SUBSYSTEM=="net", ACTION=="add", KERNEL=="usb*", ENV{NM_UNMANAGED}="1"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="r8152", ENV{NM_UNMANAGED}="1"
ENV{NM_UNMANAGED}=="1", RUN+="/usr/local/google/home/maze/bin/private auto"

so it has nothing specific to either of the 10gbit nics

zenczykowski commented 4 years ago
$ for i in /sys/class/net/e*; do echo --$i--; udevadm test-builtin net_id $i 2>/dev/null; done
--/sys/class/net/eno1--
ID_NET_NAMING_SCHEME=v243
ID_NET_NAME_MAC=enxa08....
ID_OUI_FROM_DATABASE=Hewlett Packard
ID_NET_NAME_ONBOARD=eno1
ID_NET_LABEL_ONBOARD=LAN (AMT)
ID_NET_NAME_PATH=enp0s25
--/sys/class/net/ens4f0--
ID_NET_NAMING_SCHEME=v243
ID_NET_NAME_MAC=enx000....
ID_OUI_FROM_DATABASE=Sunrich Technology Limited
ID_NET_NAME_PATH=enp132s0f0
ID_NET_NAME_SLOT=ens4f0
--/sys/class/net/ens4f1--
ID_NET_NAMING_SCHEME=v243
ID_NET_NAME_MAC=enx000....
ID_OUI_FROM_DATABASE=Sunrich Technology Limited
ID_NET_NAME_PATH=enp132s0f1
ID_NET_NAME_SLOT=ens4f1
--/sys/class/net/ens6--
ID_NET_NAMING_SCHEME=v243
ID_NET_NAME_MAC=enx000....
ID_OUI_FROM_DATABASE=Sunrich Technology Limited
ID_NET_NAME_PATH=enp5s0
ID_NET_NAME_SLOT=ens6
--/sys/class/net/eth0--
ID_NET_NAMING_SCHEME=v243
ID_NET_NAME_MAC=enxa08....
ID_OUI_FROM_DATABASE=Hewlett Packard
ID_NET_NAME_PATH=enp8s0
ID_NET_NAME_SLOT=ens8191
--/sys/class/net/eth1--
ID_NET_NAMING_SCHEME=v243
ID_NET_NAME_MAC=enx000....
ID_OUI_FROM_DATABASE=Sunrich Technology Limited
ID_NET_NAME_PATH=enp6s0
ID_NET_NAME_SLOT=ens6
zenczykowski commented 4 years ago

So looks like ID_NET_NAME_PATH is unique but ID_NET_NAME_SLOT is not (there's two ens6's)

zenczykowski commented 4 years ago

lspci tree graph with irrelevant portions stripped

$ lspci -t -nn
-+-[0000:ff]-+-08.0
 |           +-08.2
...
 |           \-1f.2
 +-[0000:80]-+-00.0-[81]--
 |           +-01.0-[82]--
 |           +-01.1-[83]--
 |           +-02.0-[84-86]--+-00.0   <-- x550 nic port 1
 |           |               \-00.1   <-- x550 nic port 2
 |           +-03.0-[87]--
 |           +-03.2-[88]--
 |           +-05.0
 |           +-05.1
 |           +-05.2
 |           \-05.4
 +-[0000:7f]-+-08.0
 |           +-08.2
...
 |           \-1f.2
 \-[0000:00]-+-00.0
             +-01.0-[01]----00.0
             +-01.1-[02]----00.0
             +-02.0-[03-06]----00.0-[04-06]--+-00.0-[05]----00.0  <-- tehuti port 1
             |                               \-08.0-[06]----00.0  <-- tehuti port 2
             +-03.0-[07]--+-00.0
             |            \-00.1
             +-05.0
...
             +-1b.0
             +-1c.0-[08]----00.0
             +-1c.3-[09]--
             +-1c.4-[0a]--
             +-1d.0
             +-1f.0
             +-1f.2
             \-1f.3
zenczykowski commented 4 years ago

I assume this probably means the x550 based card is based around a single dual port chip, while the tehuti card is simply two single port chips on a single card.

zenczykowski commented 4 years ago

I think the easiest thing to do is probably just update /etc/udev/rules.d/70-persistent-net.rules to manually name these nics based on mac addresses.

AFAICT pci slot based naming is just broken with a more than 1 chip based single pci slot nic...


A little extra background, this is a dual socket z840 workstation, both of these dual 10-gbit nics:

https://www.startech.com/Networking-IO/Adapter-Cards/10gb-pcie-network-card~ST10GSPEXNDP Tehuti - PCI Express x8 Male [PCIe 3.0 8x is 7.88 GB/s, or around 63 gbps]

and

https://www.startech.com/Networking-IO/Adapter-Cards/dual-port-network-card~ST10GPEXNDPI Intel x550 - PCI Express x4 Male [PCIe 3.0 4x is 3.94 GB/s, or around 32 gbps which is < 40gbps for full duplex rx and tx at 10 gbit on both ports]

are plugged into 16x pcie cpu backed slots (one on each socket)

I'm not even sure if the driver can tell that it's a single pci card with 2 chips on it. It could just as easily (I think) be a pci switch/bifurcation with 2 cards plugged in...


Altogether I think this might not be fixable...

acooks commented 4 years ago

Looks like ID_NET_NAME_SLOT is not appropriate for this card.

I don't fully understand how udev decides to use that, as opposed to ID_NET_NAME_PATH, which looks like it would work correctly.

Maybe if we change the driver to stop advertising hotplug support for this card, udev would use the _PATH scheme.

zenczykowski commented 4 years ago

If you have a patch for me to test I can do that no problem. In the meantime I've added manual mac->name mappings.

On Thu, Feb 27, 2020 at 2:09 PM Andrew Cooks notifications@github.com wrote:

Looks like ID_NET_NAME_SLOT is not appropriate for this card.

I don't fully understand how udev decides to use that, as opposed to ID_NET_NAME_PATH, which looks like it would work correctly.

Maybe if we change the driver to stop advertising hotplug support for this card, udev would use the _PATH scheme.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/acooks/tn40xx-driver/issues/28?email_source=notifications&email_token=AABAR6ZNODRIEWSB7AT7W7TRFA2YZA5CNFSM4K4QU2U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGETPQ#issuecomment-592202174, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABAR63TQOPXUDBKPEMG6HTRFA2YZANCNFSM4K4QU2UQ .