CiscoDevNet / Hyperflex-Hypercheck

Perform pro-active self checks on your Hyperflex cluster to ensure stability and resiliency
MIT License
27 stars 18 forks source link

Bug CSCvu05040: Check for HX down during upgrade #37

Closed raghusel closed 3 years ago

raghusel commented 3 years ago

For ESXi release less than 6.7U3-16316930, actual failback delay interval could be 100 ms though the output of "esxcli system settings advanced list | grep TeamPolicyUpDelay -A2" could be 30 sec due to bug CSCvu05040. Command "netdbg vswitch runtime get" needs to be run to verify ESXi failback delay timer for non patched releases.

Currently "Check for HX down during upgrade" checks the fail back timer value. The code runs only "esxcli system settings advanced list | grep TeamPolicyUpDelay -A2 | grep Int | cut -d ':' -f2 | cut -d ' ' -f2" to check the TeamPolicyUpDelay value while this could be not relevant if we are hitting the bug. For ESXi 6.7 releases, netdbg command needs to be run to verify failback delay interval, however this is not currently included in the code.

Tested this in lab where ESXi was running 6.7 U2 with failback timer as 100 ms, for which hypercheck script passed the test. 83c59135-ee07-4dab-8423-c26c1c8d5768

1d71d9d7-0e82-437c-bfbe-141328ed3593

raghusel commented 3 years ago

Suggesting below code for the check, where we need to check if host is running ESXi 6.7 and if so verify output from both the commands are above 30 sec to pass the test.

     # 8) Check for HX down during upgrade
        check_HX_down_status = ""
        try:
            cmd = "esxcli system settings advanced list | grep TeamPolicyUpDelay -A2 | grep Int | cut -d ':' -f2 | cut -d ' ' -f2"
            op = execmd(cmd)
            v = op[0]
            v = v.strip()
            cmd = "vmware -l | cut -d ' ' -f3"
            op1 = execmd(cmd)[0].strip()
            ESXi_67 = True if op1 == "6.7.0" else False
            if op and not(ESXi_67):
                if v.isdigit():
                    if int(v) < 30000:
                        check_HX_down_status = "FAIL"
                    else:
                        check_HX_down_status = "PASS"
            elif op and ESXi_67:
                cmd = "netdbg vswitch runtime get | grep TeamPolicyUpDelay | cut -d ':' -f2"
                op2 = execmd(cmd)[0].strip()
                if v.isdigit() and op2.isdigit():
                    if int(v) < 30000 or int(op2) < 30000:
                        check_HX_down_status = "FAIL"
                    else:
                        check_HX_down_status = "PASS"
            opd["Check for ESXI Failback timer"] = check_HX_down_status
        except Exception:
            pass
afrahmad commented 3 years ago

Thanks for writing to us. Net.TeamPolicyUpDelay timer check for Esxi 6.7 has been updated