Closed sgibson91 closed 2 years ago
I have confirmed that this is also the case for the prod hub. Hence, this is either an issue in the common config, or with the cluster itself.
I think we should prioritise investigating this issue quite highly since openscapes will continue to fail in CI/CD (and hence require manual upgrading) until it is fixed.
Hope it's ok that I self assigned this and plan to look into it soon.
I believe the issue is related to https://github.com/jupyterhub/kubespawner/pull/631.
What I think is happening is:
deployment-service-check
but without passing it any user optionskubespawner
only the user options are taken into account when loading a profile and the defaults aren't taken into account, hence the hub doesn't know which kind of server to spawnI opened https://github.com/jupyterhub/kubespawner/pull/631 to fix this upstream hopefully.
In the meantime maybe what we could for the 2i2c ci/cd to pass, would be to hack into https://github.com/2i2c-org/infrastructure/blob/02cb6b7862198a29f010ad416d5dc486d7390201/deployer/tests/test_hub_health.py#L45-L54 and pass it a dict of user-options
only if we're deploying the openscapes hub, where we tell it which kind of server to spawn.
Alternatively, we could skip checking the staging hub's health and allow the upgrade to happen to the prod hub too, since we're doing this manually anyway.
@GeorgianaElena Should we keep this open until the upstream PR is merged and we can remove the fix, or should we track that in a new issue?
Hmm, I'm not sure. My thinking was that the #fixme comment in temp fix code it's enough to close this issue and then track the upstream kubespawner PR as part of https://github.com/2i2c-org/infrastructure/issues/1055 (maybe as a small check box there)? I believe the kubespawner
version we're using comes from z2jh anyway. What do you think?
My only concern is that we don't develop much in the deployer any more and so it might take a while to rediscover and remember to fix the #fixme. However, if it will be harmless after #1055 then I don't think my concern should be a blocker.
I just opened https://github.com/2i2c-org/infrastructure/issues/1643 and added it to the list of tasks to take when https://github.com/2i2c-org/infrastructure/issues/1055 if that upgrade will come with the upstream fix, just to be sure.
Thank you @sgibson91 ✨
Thanks for opening the tracker issue, @GeorgianaElena!!
I am now seeing this problem on CarbonPlan (AWS) after https://github.com/2i2c-org/infrastructure/pull/1642#issuecomment-1232771961 Both of these hubs use the environment chooser, so I wonder if this bug is tied to that config in some way?
Sorry, just realised from your comment that it happens when we set default: true
for a profile
Context
The deployer is failing to create the deployment-service-check user (specifically on the staging hub) hence why CI/CD is failing.
Proposal
No response
Updates and actions
No response