canonical / checkbox

Checkbox
https://checkbox.readthedocs.io
GNU General Public License v3.0
30 stars 45 forks source link

Iperf3 tests are incorrectly failing if there are other Ethernet interfaces that cannot be up'd. #401

Open djacobs98 opened 1 year ago

djacobs98 commented 1 year ago

Bug Description

On some of the HW platforms I am testing, I am seeing additional Ethernet interfaces that cannot be used. For example, one platform has a Fiber Ethernet card, but we don't have the media converters.

So these interfaces appear to Ubuntu, but you can't use '$ sudo ip link set dev up' to bring them up.
You'll get an error.

When the iperf3 tests under the Stress test plan run, the test brings down all other Ethernet interfaces except for the interface being tested. After iperf3 finishes, the test tries to bring all interfaces back up. But because these unusable interfaces cause an error when you try to bring them up, this error causes the entire iperf3 test to fail.

The iperf3 test should make a note of the interfaces that are actually up at the beginning of the test, and only bring those down and back up. The rest should be ignored so they don't cause false failures.

To Reproduce

  1. Take a system with 2 or more interfaces.
  2. Only connect a cable to 1 interface (e.g. enp1s0)
  3. Run the iperf3 test on that interface (enp1s0)
  4. iperf3 test will fail when it tries to restore the connection on the other, disconnected, interface.

Environment

Checkbox snap on either Classic or Core. System must have 2 or more Ethernet interfaces.

Relevant log output

Launchpad : https://bugs.launchpad.net/lookout-canyon/+bug/2000283

[Checkbox job `com.canonical.certification::ethernet/iperf3_enp1s0` output]

stderr
------
WARNING:root:Removing iperf server 10.102.182.100 (10.102.182.100) from
WARNING:root:test list since it's not within 10.102.88.0/23.
WARNING:root:Removing iperf server 10.102.182.137 (10.102.182.137) from
WARNING:root:test list since it's not within 10.102.88.0/23.
WARNING:root:Removing iperf server 10.102.182.101 (10.102.182.101) from
WARNING:root:test list since it's not within 10.102.88.0/23.
INFO:root:Testing enp1s0 against 10.102.89.198
INFO:root:Have successfully pinged 10.102.89.198 on enp1s0
INFO:root:-------------------- Test Run Number 1 --------------------
INFO:root:Using 1 thread.
INFO:root:Connecting to port 5201 on server....
INFO:root:Avg Transfer speed: 939.1 Mb/s
INFO:root:93.91% of theoretical max 1000 Mb/s
INFO:root:Average CPU utilization: 3.0%
INFO:root:
INFO:root:-------------------- Test Run Number 2 --------------------
INFO:root:Using 1 thread.
INFO:root:Connecting to port 5201 on server....
INFO:root:Avg Transfer speed: 939.3 Mb/s
INFO:root:93.93% of theoretical max 1000 Mb/s
INFO:root:Average CPU utilization: 2.9%
INFO:root:
INFO:root:-------------------- Test Run Number 3 --------------------
INFO:root:Using 1 thread.
INFO:root:Connecting to port 5201 on server....
INFO:root:Avg Transfer speed: 939.1 Mb/s
INFO:root:93.91% of theoretical max 1000 Mb/s
INFO:root:Average CPU utilization: 2.9%
INFO:root:
INFO:root:-------------------- Test Run Number 4 --------------------
INFO:root:Using 1 thread.
INFO:root:Connecting to port 5201 on server....
INFO:root:Avg Transfer speed: 939.1777777777778 Mb/s
INFO:root:93.92% of theoretical max 1000 Mb/s
INFO:root:Average CPU utilization: 3.0%
INFO:root:
RTNETLINK answers: Invalid argument
ERROR:root:Failed to restore enp0s30f4:Command '['ip', 'link', 'set', 'dev', 'enp0s30f4', 'up']' returned non-zero exit status 2.
RTNETLINK answers: Invalid argument
ERROR:root:Failed to restore enp0s30f5:Command '['ip', 'link', 'set', 'dev', 'enp0s30f5', 'up']' returned non-zero exit status 2.

Additional context

No response

bladernr commented 1 year ago

For a full run, this is by design (somewhat). The expectation is that testers will set up all devices on a machine prior to testing, they will all be up and running and will be tested in sequence. My expectation is that a device that is either down at test time, or fails to come up for whatever reason, should trigger a test failure and thus trigger further review.

I think making it more stateful is fine as long as that statefulness is optional. I don't want to see tests passing on a server with 8 network ports because the tester failed to configure 7 of them, as I said, for Server that is one flag that triggers a deeper review of the results and questions back to the test engineer.

One initial thought is that it would be helpful for the script to throw an error or warning BEFORE testing to indicate that one or more network device was not up prior to testing as well. And for the rest, maybe add something like '--ignore-down-devices' to tell it to disregard things that are not up at run time and can't be brought back up later.

For example, one platform has a Fiber Ethernet card, but we don't have the media converters.

This is a bit concerning... are those devices simply never tested in a certified machine?

djacobs98 commented 1 year ago

This is from the Lookout-Canyon project which everyone says "is a little different." We will often receive hardware from the partner with no documentation or requests other than "let us know if it works" I don't know why an IOT device would have Fiber, but here we are.

Other times we're given an early reference board which has multiple Ethernet controllers but only 1 physical port. As a result, Ubuntu will report several interfaces but only the one(s) with a physical port can be used. I call these "phantom interfaces." You can down a phantom interface but trying to bring one up will cause a test-breaking failure. The interface was never up in the first place and can never be brought up, so it should be ignored by this step of iper3.

My concern is that the iperf3 test will give false negatives. Interface A shouldn't fail because interface B isn't working properly (for whatever reason) yet that's exactly what's happening here.

baconYao commented 4 months ago

Issue can be reproduced on Baoshan Project (G1200-evk)

Since there's a can interface who is DOWN by default. So, tester has to be bring it up manually everytime before iperf3 testing.

ceqa@ubuntu:~$ ip -c a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: can0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN group default qlen 10
    link/can 
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:e7:e9:23:76 brd ff:ff:ff:ff:ff:ff
    inet 10.102.88.203/23 metric 100 brd 10.102.89.255 scope global dynamic eth0
       valid_lft 590sec preferred_lft 590sec
    inet6 fe80::20c:e7ff:fee9:2376/64 scope link 
       valid_lft forever preferred_lft forever
4: wlp1s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 2c:3b:70:eb:13:8d brd ff:ff:ff:ff:ff:ff

Don't we just care and switch the target ethernet interface?

syncronize-issues-to-jira[bot] commented 4 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/CHECKBOX-1428.

This message was autogenerated