datacenter / ACI-Pre-Upgrade-Validation-Script

A script to run validations to detect potential issues that may cause an ACI fabric upgrade to fail
https://datacenter.github.io/ACI-Pre-Upgrade-Validation-Script/
Apache License 2.0
43 stars 27 forks source link

bootflash fail over 50% if node already staged #20

Closed scsheldo closed 6 months ago

scsheldo commented 2 years ago

Ran the script after a spine had downloaded new image Spine now had running image and new image Script showed a fail because now spine was over 50% bootflash Think this is a false positive

Gathering APIC Versions from Firmware Repository...

What is the Target Version? : 2

[Check 7/37] Switches are all in Active state... PASS [Check 8/37] NTP Status... PASS [Check 9/37] Firmware/Maintenance Groups when crossing 4.0 Release... Versions not applicable N/A [Check 10/37] Features that need to be Disabled prior to Upgrade... PASS [Check 11/37] Switch Upgrade Group Guidelines... PASS [Check 12/37] APIC Disk Space Usage (F1527, F1528, F1529 equipment-full)... PASS [Check 13/37] Switch Node /bootflash usage... FAIL - UPGRADE FAILURE!! Pod-ID Node-ID Utilization Alert


1 1001 58.4411409341 Over 50% usage! Contact Cisco TAC for Support

VWNSPN1001# ls -l total 6563128 -rw-rw-rw- 1 root root 4977087 Mar 25 03:02 CpuUsage.Log -rw-rw-rw- 1 root root 1776793780 Nov 5 2019 aci-n9000-dk9.14.1.2u.bin -rw-rw-rw- 1 root root 1961654688 Mar 25 22:00 aci-n9000-dk9.14.2.7f.bin -rw-r--r-- 1 root root 1500113199 Aug 29 2019 auto-k -rw-r--r-- 1 root root 1471706292 Nov 5 2019 auto-s -rw-rw-rw- 1 root root 2 Nov 5 2019 diag_bootup -rw-r--r-- 1 root root 54 Mar 25 22:50 disk_log.txt -rw-rw-rw- 1 root root 7 Mar 25 22:00 imgDnldStatus -rw-rw-rw- 1 root root 693 Nov 5 2019 libmon.logs drwxr-xr-x 4 root root 4096 Aug 29 2019 lxc -rw-r--r-- 1 root root 4383581 Mar 25 22:50 mem_log.txt -rw-r--r-- 1 root root 724404 Nov 5 2019 mem_log.txt.old.gz -rw-r--r-- 1 root root 180202 Mar 25 22:50 mts_buffer_log.log -rw-rw-rw- 1 root root 41461 Feb 18 04:29 urib_api_log.txt

VWNSPN1001# show version Cisco Nexus Operating System (NX-OS) Software TAC support: http://www.cisco.com/tac Documents: http://www.cisco.com/en/US/products/ps9372/tsd_products_support_series_home.html Copyright (c) 2002-2014, Cisco Systems, Inc. All rights reserved. The copyrights to certain works contained in this software are owned by other third parties and used and distributed under license. Certain components of this software are licensed under the GNU General Public License (GPL) version 2.0 or the GNU Lesser General Public License (LGPL) Version 2.1. A copy of each such license is available at http://www.opensource.org/licenses/gpl-2.0.php and http://www.opensource.org/licenses/lgpl-2.1.php

Software BIOS: version 05.35 kickstart: version 14.1(2u) [build 14.1(2u)] system: version 14.1(2u) [build 14.1(2u)] PE: version 4.1(2u) BIOS compile time: 05/10/2019 kickstart image file is: /bootflash/aci-n9000-dk9.14.1.2u.bin kickstart compile time: 10/30/2019 11:58:04 [10/30/2019 11:58:04] system image file is: /bootflash/auto-s system compile time: 10/30/2019 11:58:04 [10/30/2019 11:58:04]

Hardware cisco N9K-C9364C ("supervisor") Intel(R) Xeon(R) CPU D-1526 @ 1.80GHz with 32695296 kB of memory. Processor Board ID FDO22350P3G

Device name: VWNSPN1001 bootflash: 125029376 kB

Kernel uptime is 871 day(s), 02 hour(s), 49 minute(s), 44 second(s)

Last reset at 26000 usecs after Tue Nov 05 21:01:55 2019 UTC Reason: reset-by-installer System version: 14.1(2m) Service: Upgrade

plugin Core Plugin, Ethernet Plugin VWNSPN1001#

monrog2 commented 2 years ago
  1. We can figure out the wording of the doc to capture this scenario:

If the target version was staged some time in prep for the upgrade, the auto cleanup will stop happening until after the upgrade has completed.

  1. can update the result text to be clear of this difference
monrog2 commented 1 year ago

Updated logic should check the maintUpgJob MO to see if the swithc has already pre-loaded the switch image, and not flag that switch if it crossed threshold with "dnldStatus": "downloaded" :


{
  "maintUpgJob": {
    "attributes": {
      "creationDate": "2022-11-16T10:29:15.665-08:00",
      "desiredVersion": "n9000-16.0(1.295)",
      "dn": "topology/pod-1/node-1001/sys/fwstatuscont/upgjob",
      "dnldPercent": "100",
      "dnldStatus": "downloaded",           <--- !!!!!!!!!!!!
        ...
      "srUpg": "no",
      "startDate": "2022-11-16T10:15:22.894-08:00",
      "status": "",
      "upgradeStatus": "scheduled",           <--- !!!!!!!!!!!!
      "upgradeStatusStr": "Scheduled"
    }
  }
},