coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
147 stars 30 forks source link

1298.5.0 unstable networking (bare metal, bonded 2x10gig with VLANs) #1855

Closed jlmeeker closed 7 years ago

jlmeeker commented 7 years ago

Bug

Multiple bare metal servers fail to pass network traffic after upgrade to 1298.5.0.

All network interfaces say up and appear to be in a normal state (as is the bond). No network traffic (ICMP, TCP) can pass out of the NICs.

Container Linux Version

cat /etc/lsb-release; uname -a

DISTRIB_ID="Container Linux by CoreOS" DISTRIB_RELEASE=1298.5.0 DISTRIB_CODENAME="Ladybug" DISTRIB_DESCRIPTION="Container Linux by CoreOS 1298.5.0 (Ladybug)"

uname -a Linux xxxxxx.xxxxx.com 4.9.9-coreos-r1 #1 SMP Tue Feb 28 00:06:10 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz GenuineIntel GNU/Linux

Environment

System Information Manufacturer: Dell Inc. Product Name: PowerEdge R630

NICs Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) Subsystem: Dell Ethernet 10G 4P X710/I350 rNDC

Expected Behavior

All ethernet devices would come up as they did before the update.

Actual Behavior

NO traffic leave the server. I cannot ping or make TCP connections on any previously working interfaces.

Reproduction Steps

  1. Start with base ContainerLinux version 1235.12.0 (or older)
  2. Let the system upgrade to 1298.5.0 and reboot
  3. Log in (console) after the reboot, run ifconfig/ip addr/ip link and see all network interfaces configured and full link reported
  4. Try to do anything on the network, it all fails
  5. Force revering to previous version and reboot.... everything comes up fine.
bgilbert commented 7 years ago

Are you still seeing this with the current stable release?

jlmeeker commented 7 years ago

I am out of town until Monday. I'll check next week and let you know.

jlmeeker commented 7 years ago

Just upgraded one of our machines to (stable) 1409.7.0 and it successfully booted and seems to be running fine. I'll leave it for a few days and see how it goes.

jlmeeker commented 7 years ago

My test node is still running fine. I'll start upgrading other nodes with the current stable.

jlmeeker commented 7 years ago

This issue seems to be resolved, closing.