LibreTexts / metalc

LibreTexts/UCDavis bare-metal Kubernetes cluster running JupyterHub and Binder
http://jupyter.libretexts.org
MIT License
14 stars 2 forks source link

Scripts/Cronjobs for Galaxy cluster #224

Closed sandertyu closed 3 years ago

sandertyu commented 3 years ago

Currently there are 2 cronjobs running on the gravity management node; one which scrubs the file server blackhole each month and reports back the findings, and another which emails us every 4 months to upgrade the cluster (view these with sudo crontab -e on gravity). Some suggested changes;

rkevin-arch commented 3 years ago

potentially create a script which notifies us when the nodes need to be restarted to apply certain automatic Ubuntu updates.

If you want to, you can determine if you need to reboot by looking at the file /var/run/reboot-required, and the /var/run/reboot-required.pkgs file tells you which packages are responsible for it.

The cluster is highly available so we could apply these updates by rebooting more often.

I don't think this is a good idea, because even when the cluster itself is HA, jupyterhub is not. If you restart a node with the hub pod, the hub pod will become inaccessible until it's respawned. Also, if we kill user pods, they won't autorespawn and the user could genuinely lose data. I think bundling this with cluster upgrades would be better, or we have to come up with a fairly complicated system to determine which nodes are safe to cordon and only update those.

pmackle commented 3 years ago

I'll try tackling this

sandertyu commented 3 years ago

We've got a satisfactory number of automated scripts, and they are all set up through puppet. Closing.