cdot65 / pan-os-upgrade

An efficient tool to execute configuration backups, network state snapshots, system readiness checks, and operating system upgrades of Palo Alto Networks firewalls and Panorama appliances.
https://cdot65.github.io/pan-os-upgrade/
Apache License 2.0
39 stars 7 forks source link

Inconsistent HA Peer Upgrade Order Due to Thread Limitations #87

Closed cdot65 closed 6 months ago

cdot65 commented 6 months ago

Description

When performing batch upgrades on firewalls configured in High Availability (HA) mode, there's an observed inconsistency in the upgrade order, particularly when the number of targeted firewalls surpasses the available thread count. This leads to scenarios where an HA peer firewall may be upgraded prior to the active unit. Given the script's two-phase operation—initially enforcing HA synchronization, then relaxing this requirement in the latter phase—this behavior can cause issues. Specifically, when an HA peer is upgraded ahead of the active unit, the subsequent attempt to upgrade the active firewall might fail due to the HA synchronization requirement not being met due to version mismatches.

Impact

This issue can disrupt the intended sequential upgrade process in HA configurations, potentially leading to situations where the active firewall cannot be upgraded due to its peer already being on a different version, thereby violating the HA synchronization check in the script's initial phase.

Proposed Solution

It is suggested that the requirement for HA synchronization be relaxed for both phases of the script. This adjustment would accommodate scenarios where the upgrade sequence cannot be strictly controlled due to threading limitations, ensuring that upgrades can proceed even when HA peers are temporarily out of sync due to version discrepancies.

Steps to Reproduce

Expected Behavior

The script should maintain HA synchronization requirements flexibly, allowing upgrades to proceed even when an HA peer has been upgraded first due to threading constraints.

Actual Behavior

The script enforces HA synchronization strictly in its initial phase, potentially halting the upgrade process if the active unit's HA peer has already been upgraded, leading to version discrepancies and failed HA sync checks.

Suggested Enhancement

Modify the ha_sync_check_firewall function (and any related logic) to relax the HA synchronization checks throughout the script's operation, thereby accommodating the non-deterministic upgrade order induced by threading limitations. This change aims to enhance the script's resilience and adaptability in complex HA configurations, ensuring a smoother upgrade experience across diverse deployment scenarios.