Upgrade fails when first deployment fails

berkeley-dsep-infra / hubploy

Toolkit to deploy many z2jh based JupyterHubs

BSD 3-Clause "New" or "Revised" License

17 stars 15 forks source link

Upgrade fails when first deployment fails #25

Open tjcrone opened 5 years ago

tjcrone commented 5 years ago

When a helm deployment fails because of a timeout, which often happens for the first deployment on a new cluster because image pulls can take a long time, the first deployment will be listed as FAILED. When running hubploy again, Error: UPGRADE FAILED: "ooi-staging" has no deployed releases will always occur because there is no successful deployment. The only solution is do delete the deployment entirely, and try again. Do we see any way of identifying this situation and rather than trying an upgrade which will certainly fail, deploying again?

tjcrone commented 5 years ago

Just tried the delete-rerun-workflow sequence, and of course instead of the no deployed releases error, got the classic timed out waiting for the condition. So getting a deployment going on a new cluster involves a delete-try-again-cross-fingers-repeat program which I feel could somehow be better. Any ideas on how I might make this work better? Worth noting, once one deployment is in, that changes everything because an upgrade can theoretically work.

tjcrone commented 5 years ago

We can add the --force option to the upgrade, and in that case it will try to install over the top of the first failed release. are there downsides to adding this option to the helm upgrade command?