Closed sandertyu closed 3 years ago
potentially create a script which notifies us when the nodes need to be restarted to apply certain automatic Ubuntu updates.
If you want to, you can determine if you need to reboot by looking at the file /var/run/reboot-required
, and the /var/run/reboot-required.pkgs
file tells you which packages are responsible for it.
The cluster is highly available so we could apply these updates by rebooting more often.
I don't think this is a good idea, because even when the cluster itself is HA, jupyterhub is not. If you restart a node with the hub pod, the hub pod will become inaccessible until it's respawned. Also, if we kill user pods, they won't autorespawn and the user could genuinely lose data. I think bundling this with cluster upgrades would be better, or we have to come up with a fairly complicated system to determine which nodes are safe to cordon and only update those.
I'll try tackling this
We've got a satisfactory number of automated scripts, and they are all set up through puppet. Closing.
Currently there are 2 cronjobs running on the
gravity
management node; one which scrubs the file serverblackhole
each month and reports back the findings, and another which emails us every 4 months to upgrade the cluster (view these withsudo crontab -e
ongravity
). Some suggested changes;