Closed prakashsurya closed 11 months ago
For my knowledge, how does one quiesce VDBs while the stack is down?
you can't.. by "user" in that statement, I meant whatever happens to be orchestrating the upgrade.. which is the product's upgrade Java logic in this case.
If a FULL upgrade fails for any reason right after the execute script finishes, what could the recovery look like? We'd have to presumably now depend on the stack to successfully start such that VDBs are quiesced before the engine is rebooted which may not always be dependable as we know from recent escalations.
it'll depend exactly where the failure occurs.. but generally, if execute
is run, but doesn't complete.. the fix should be to re-run execute
manually, as that script is idempotent.. VDBs will be quiesced and a reboot triggered on stack start up, after execute
runs to completion, and restarts the mgmt service..
in the worst of cases, where the stack doesn't come up, a "hard" reboot should be fine.. it'd be no different than a kernel crash.. sure, perhaps VDBs might not behave properly due to not having been quiesced, but that can happen at any point outside of upgrade, via a kernel panic..
Context
We intend to modify the upgrade logic, such that we don't perform a reboot when doing FULL upgrades until after a stack restart, such that we can wait to quiesce VDBs until we're running the stack on the new version. For more context w.r.t. the motivation for doing this, see CP-10570.
Problem
Currently, when the "execute" is run, it'll automatically reboot the system after packages are upgraded, when a FULL upgrade is requested. This conflicts with the goals of CP-10570, as "execute" is run from the "old version", rather than the "new version".
Solution
The changes being made in this PR, is to modify execute to only restart delphix services, regardless if a FULL or DEFERRED upgrade is requested. This means, via the scripts, there is no longer any difference between a FULL or DEFERRED upgrade. But, with the accompanying virtualization changes, the required reboot for a FULL upgrade will now be performed by the virtualization product's upgrade logic instead.
The intention is for the virtualization product's upgrade logic to run the "execute" script to perform the package upgrades as necessary, and restart the application. Then, when the application starts back up on the new version, it'll detect a FULL upgrade was being performed at stack startup time, and automatically initiate the necessary logic to quiesce VDBs and perform the reboot.
One caveat to the approach taken in this PR, is any consumers that happened to be using the upgrade scripts to perform a FULL upgrade, will now need to manually reboot the system themselves. Argueably this fixes a bug, since previously a FULL upgrade via the scripts would not quiesce VDBs, and thus could result in problems for VDBs due to the reboot; i.e. it's now up to the user to quiesce VDBs after the upgrade, and perform the reboot.
Related Work
https://github.com/delphix/dlpx-app-gate/pull/1389
Testing
git-ab-pre-push
is here