canonical / checkbox

Checkbox is a testing framework used to validate device compatibility with Ubuntu Linux. It’s the testing tool developed for the purposes of the Ubuntu Certification program.
https://checkbox.readthedocs.io
GNU General Public License v3.0
33 stars 49 forks source link

Iperf Tests Passed But ethtool Check Shows No Link Detected #331

Open mreed8855 opened 1 year ago

mreed8855 commented 1 year ago

Bug Description

I have noticed on a few network device certifications for Dell that the ethertool_check_device test shows that the device does not have a link but the iperf test passes on the device. The test ethertool_check_device is a non-blocker. I have noticed this on the second port on cards when it does happen. https://certification.canonical.com/certificates/2302-14156/

Iperf test passing ethernet/multi_iperf3_nic_device3_enp22s0f1 https://certification.canonical.com/hardware/202301-31143/submission/298565/test/193001/result/31146942/

ethertool check shows no link detected ethernet/ethertool_check_device3_enp22s0f1

https://certification.canonical.com/hardware/202301-31143/submission/298565/test/192999/result/31146935/Settings for enp22s0f1: Supported ports: [ ] Supported link modes: 10000baseT/Full 25000baseCR/Full 25000baseSR/Full 50000baseCR2/Full 100000baseSR4/Full 100000baseCR4/Full 100000baseLR4_ER4/Full 50000baseSR2/Full 10000baseSR/Full 10000baseLR/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Supported FEC modes: None RS BASER Advertised link modes: 10000baseT/Full 25000baseCR/Full 25000baseSR/Full 50000baseCR2/Full 100000baseSR4/Full 100000baseCR4/Full 100000baseLR4_ER4/Full 50000baseSR2/Full 10000baseSR/Full 10000baseLR/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Advertised FEC modes: None RS BASER Speed: Unknown! Duplex: Unknown! (255) Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Supports Wake-on: d Wake-on: d Current message level: 0x00000007 (7) drv probe link Link detected: no

To Reproduce

  1. Set up an iperf server
  2. Run a full certification or network certification.

Environment

Ubuntu 22.04.1 LTS

Relevant log output

- Checkbox session(s) (located in `/var/tmp/checkbox-ng/sessions/`, you usually want to select the most recent one)

- logs from the impacted components (e.g. `lsblk` if this is related to an issue when testing a disk...); a safe option is to install and run `sosreport` to gather as much log as possible.

Additional context

No response

mreed8855 commented 1 year ago

Here is another example Broadcom NetXtreme-E P2100D BCM57508 2x100G QSFP PCIE Ethernet https://certification.canonical.com/certificates/2302-14153/

Mellanox ConnectX-6 Single Port HDR100 QSFP56 PCIE Adapter https://certification.canonical.com/certificates/2302-14155/

bladernr commented 1 year ago

Yeah, so there are a couple things happening here... first I suspect some sort of kernel/driver bug. As you pointed out, ethtool is not reporting the card being connected:

enp22s0f1:
SNIP
Speed: Unknown!
Duplex: Unknown! (255)
SNIP
Link detected: no

And if you look at the results of ethernet/info_automated, it shows both cards connected, but only at 50Gb/s:

Category: NETWORK
Interface: enp22s0f0
Product: Ethernet Controller E810-C for QSFP (Ethernet 100G 2P E810-C Adapter)
Vendor: Intel Corporation
Driver: ice
Driver Version: 
Path: /devices/pci0000:15/0000:15:01.0/0000:16:00.0
Id: [8086:1592]
Subsystem Id: [8086:000b]
Mac: 40:a6:b7:52:b9:80
Carrier Status: Connected
Ipv4: 10.1.1.14
Ipv6: fe80::42a6:b7ff:fe52:b980/64
Speed: 50000
Supported Modes: 
    25000baseCR/Full
    25000baseSR/Full
    50000baseCR2/Full
    50000baseSR2/Full
    100000baseCR4/Full
    100000baseLR4_ER4/Full
    100000baseSR4/Full
Advertised Modes: 
    25000baseCR/Full
    50000baseCR2/Full
    100000baseCR4/Full
Partner Modes: 

Category: NETWORK
Interface: enp22s0f1
Product: Ethernet Controller E810-C for QSFP (Ethernet 100G 2P E810-C Adapter)
Vendor: Intel Corporation
Driver: ice
Driver Version: 
Path: /devices/pci0000:15/0000:15:01.0/0000:16:00.1
Id: [8086:1592]
Subsystem Id: [8086:000b]
Mac: 40:a6:b7:52:b9:81
Carrier Status: Connected
Ipv4: 10.1.1.15
Ipv6: fe80::42a6:b7ff:fe52:b981/64
Speed: 50000
Supported Modes: 
    25000baseCR/Full
    25000baseSR/Full
    50000baseCR2/Full
    50000baseSR2/Full
    100000baseCR4/Full
    100000baseLR4_ER4/Full
    100000baseSR4/Full
Advertised Modes: 
    25000baseCR/Full
    50000baseCR2/Full
    100000baseCR4/Full
Partner Modes: 

But then if you look at the results of the iperf test:

both devices pass because they do test out at a passing speed on 100Gb/s

INFO:root:Avg Transfer speed: 88597.61222222223 Mb/s
INFO:root:88.60% of theoretical max 100000 Mb/s

So we need bugs against ethtool and sysfs/kernel? as the numbers in info_automated_server are pulled I think from sysfs entries for the device:

112 @classmethod
113     def get_speed(cls, interface):
114         speed_file = os.path.join(cls.sys_path, interface, 'speed')
115         try:
116             return open(speed_file, 'r').read().strip()
117         except IOError:
118             return 'UNKNOWN'

So what we have here is:

  1. ethtool is showing no link and no speed when a link and speed exist
  2. sysfs is showing a link and speed but showing what appears to be an incorrect speed
  3. possibly driver is reporting erroneously to both sysfs and ethtool

That is an e810, can you have them retry this using the upstream Intel ICE driver to see if the latest upstream version works? And likewise, also check using 23.04 or 22.10 with 5.19 kernel to see if that fares better?

For the checkbox side, checkbox test should fail if the speed observed doesn't mesh with what the system is reporting...

it would also be helpful to see what sysfs is showing for speed, and what dmesg shows regarding the device speed, and maybe info from mii-tool?

mreed8855 commented 1 year ago

I have found another example where ethtool shows no link but iperf passes. Dell Technologies PowerEdge R7615 ND4PT- Intel(R) Ethernet 10G 4P X710-T4L-t Adapter R1KTR -Intel(R) Ethernet 25G 4P E810-XXV OCP https://certification.canonical.com/certificates/2303-14254/

bladernr commented 1 year ago

This is all on 16G right? Because those two are established devices that should not have any issue here... at least I think that is the case for other systems running Jammy

On Tue, Mar 14, 2023 at 4:21 PM mreed8855 @.***> wrote:

I have found another example where ethtool shows no link but iperf passes. Dell Technologies PowerEdge R7615 ND4PT- Intel(R) Ethernet 10G 4P X710-T4L-t Adapter R1KTR -Intel(R) Ethernet 25G 4P E810-XXV OCP https://certification.canonical.com/certificates/2303-14254/

— Reply to this email directly, view it on GitHub https://github.com/canonical/checkbox/issues/331#issuecomment-1468782945, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAYWSGZ7FI7K3FYIE3EXYTW4DHNPANCNFSM6AAAAAAU4CAC2I . You are receiving this because you commented.Message ID: @.***>

-- Jeff Lane - Engineering Manager, Tools Developer, Warrior Poet, Lover of Pie Ubuntu Ham: W4KDH Freenode IRC: bladernr or bladernr_ gpg: 1024D/3A14B2DD 8C88 B076 0DD7 B404 1417 C466 4ABD 3635 3A14 B2DD