F5Networks / f5-google-gdm-templates

Google Deployment Templates for quickly deploying BIG-IP services in Google Cloud Platform
28 stars 45 forks source link

Managment routes lost in software upgrade #45

Closed curtkersey closed 3 years ago

curtkersey commented 4 years ago

Do you already have an issue opened with F5 support?

No support case opened for this issue. A support case was opened on the original issue, which prompted the need to load engineering HF, bug ID852437.

Description

Deployed v3.4 of template, https://github.com/F5Networks/f5-google-gdm-templates/tree/master/supported/failover/same-net/via-lb/3nic/existing-stack/byol. imageName: f5-bigip-14-1-2-3-0-0-5-byol-all-modules-2boot-loc-19121814234.

On upgrade (to same version, 14.1.2.3), I was not able to connect to management IP any longer - not SSH or TMUI. I tried to connect via a machine local to the BIG-IP (same GCP VCP), and that did not work either.

Found issue with management routes being goofed up. I grabbed the commands from /var/log/cloud/google/install.log that set the management routes. I ran those commands, then it worked.

Routes after upgrade (broken): netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 10.10.2.1 0.0.0.0 UG 0 0 0 external 10.10.1.1 0.0.0.0 255.255.255.255 UH 0 0 0 mgmt 10.10.2.0 10.10.2.1 255.255.255.0 UG 0 0 0 external 10.10.2.1 0.0.0.0 255.255.255.255 UH 0 0 0 external 10.10.3.0 10.10.3.1 255.255.255.0 UG 0 0 0 internal 10.10.3.1 0.0.0.0 255.255.255.255 UH 0 0 0 internal 127.1.1.0 0.0.0.0 255.255.255.0 U 0 0 0 tmm 127.7.0.0 127.1.1.253 255.255.0.0 UG 0 0 0 tmm 127.20.0.0 0.0.0.0 255.255.0.0 U 0 0 0 tmm_bp

Routes after adding management routes back in:

netstat -rn

Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 10.10.2.1 0.0.0.0 UG 0 0 0 external 0.0.0.0 10.10.1.1 0.0.0.0 UG 0 0 0 mgmt 10.10.1.0 10.10.1.1 255.255.255.0 UG 0 0 0 mgmt 10.10.1.1 0.0.0.0 255.255.255.255 UH 0 0 0 mgmt 10.10.2.0 10.10.2.1 255.255.255.0 UG 0 0 0 external 10.10.2.1 0.0.0.0 255.255.255.255 UH 0 0 0 external 10.10.3.0 10.10.3.1 255.255.255.0 UG 0 0 0 internal 10.10.3.1 0.0.0.0 255.255.255.255 UH 0 0 0 internal 127.1.1.0 0.0.0.0 255.255.255.0 U 0 0 0 tmm 127.7.0.0 127.1.1.253 255.255.0.0 UG 0 0 0 tmm 127.20.0.0 0.0.0.0 255.255.0.0 U 0 0 0 tmm_bp

Severity Level

Severity 2 -- big issue for my customer that needs to load engineering HF.

JeffGiroux commented 4 years ago

I setup my lab and ran into this same issue. I was on f5-bigip-14-1-2-3-0-0-5-payg-good-1gbps-191218142235, uploaded the 14.1.2.3 ISO (yes...same version), upgraded to boot slot 2, rebooted. After reboot, device unreachable other than through console.

Behavior seems identical to past bug https://github.com/F5Networks/f5-google-gdm-templates/issues/33

Workaround (example)

  1. tmsh delete sys management-route all
  2. tmsh create sys management-route mgmt_gw network 10.1.1.1/32 type interface
  3. tmsh create sys management-route mgmt_net network 10.1.1.0/255.255.255.0 gateway 10.1.1.1
  4. tmsh create sys management-route default gateway 10.1.1.1

Note: you must delete all mgmt routes first because they still exist. Somehow tmsh list sys management-route still shows the routes but netstat output does not. Therefore delete sys mgmt routes first, then add back in just like workaround above.

If you need to know the commands to use, you can do this...

  1. enable serial console in google on the BIG-IP VM
  2. login via serial console
  3. cat /var/log/cloud/google/install.log

P.S. This also happens on a 15.1.0.2 upgrading to 15.1.0.2. Seems to be related to version upgrade. A regular reboot causes no harm. But a version upgrade + reboot causes loss of mgmt routing.

shyawnkarim commented 4 years ago

Thanks for reaching out to us with this bug. A DB variable is used for the nic swap and it looks like this may be getting lost during the upgrade.

Internal ID VECLOUD-989 has been created for this.

jtylershaw commented 4 years ago

@shyawnkarim, we've seen this issue when automating upgrade testing from 15.0.1 (template default) to 15.1.0.4 (latest code) as well. This was deployed with GDM template 3.6.0.

shyawnkarim commented 3 years ago

Closing.

This bug has been fixed and details of the the fix can be seen on the F5 Bug Tracker.