Closed Jose-Matsuda closed 2 years ago
Ideally I can just add the imagePullPolicy
thing in this line
kubectl patch Notebook $notebookname --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"k8scc01covidacr.azurecr.io/'"$i"':c5b7982c"}]' --namespace $namespace
Yup running this is fine (without the --dry-run it actually did the update and the statefulset updated accordingly), (note that I used Never
here just to change it)
kubectl patch Notebook patchtest --dry-run=client --type='json' -p='[
{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"k8scc01covidacr.azurecr.io/jupyterlab-cpu:v1"},
{"op": "replace", "path": "/spec/template/spec/containers/0/imagePullPolicy", "value":"Never"}
]' --namespace jose-matsuda
Resolution Script
https://gist.github.com/Jose-Matsuda/61ec40d175fadfff045be5be481dc7b9
Ran successfully on dev, note that you need to remove the --dry-run bit if you want it to actually go. After every 20 workload patches
there is a 5 second sleep, though I would imagine it could be a bit more (honestly could make it 10 seconds) as the time it takes for the pod to come back up is longer.
Note that unfortunately I forgot to tell Souheil to also change the default imagePullPolicy
on the spawner config so I had to also account for that in this PR. https://github.com/StatCan/aaw-kubeflow-manifests/pull/189/files#diff-364a9e35e63c4516d98101fc647536dd0e712382eabd4c41913e1806f571b731R51
So it is likely that you will need to change this to also just get every single image regardless of tag, in order to also change the imagePullPolicy
CC @chuckbelisle to action when he finds time to do so. You can copy paste the gist here but you will need to take out the --dry-run=client
on line 29 to make sure you know you want to execute it.
There is also the sleep 7
on line 25. This is done every 20 kubectl restarts
and we have around 400 notebooks that need to be restarted which is around 140 seconds of built in delay overall (which honestly may not be enough).
Changed sleep time to 15 seconds Executed patch script July 1st 2022 @ 10:15pm
Incredibly similar to https://github.com/StatCan/daaas/issues/976 (would just need to add remote desktop to the list to iterate through). We also want to patch the imagePullPolicy at the same time if possible.
Reasoning
This is to facilitate the running of a weekly cronjob that will restart user workloads iff their version of "v1" is "older" because their image digest will not match the most recent one (because say a push was triggered to aaw-kubeflow-containers
master
branch).Concerns
Like in https://github.com/StatCan/daaas/issues/983 this script will run just fine, but with a lot of pods being rescheduled we may be bringing gatekeeper down to its knees again