GeoscienceAustralia / dea-sandbox

Digital Earth Australia Sandbox config and planning
Apache License 2.0
13 stars 6 forks source link

Support non-writeable (overfull) home directory #285

Open benjimin opened 2 months ago

benjimin commented 2 months ago

In JupyterHub deployments, we're inclined to mount a user's personal volume to the home directory. If this storage volume gets full, their login reportedly fails (necessitating admin intervention to enlarge their volume, or potentially to clean up the contents). This creates a bad user experience (being abruptly locked out to wait for human support, or having to exercise ongoing caution to avoid this hazard) and creates support workload for admins (potentially leading to excessive storage allocation just to minimise disruption).

We should ensure that this container image is able to run fine regardless (i.e. startup and serve the jupyter web interface functionally), so that the JupyterHub user can login and clean up their home directory contents using the navigator pane interface (without admin involvement). It is reasonable that they will get errors attempting to save notebooks until they free some space, but it shouldn't cause login to abort.

Note, even if the user volume is full, there is presumably still scratch space available elsewhere in the container's filesystem. (Also, note there are multiple options for how a git sync into the home directory can be orchestrated, for example, a bespoke script in the container image versus a common sidecar provided by Z2JH.) Should be able to test the behaviour by having a user dd a large file into existence. (Could also create a docker compose file for a regression test.)

robbibt commented 2 months ago

This would indeed be a big improvement to our current setup. I think allowing users to cleanup their own files would be fantastic, and it would likely be cheaper for us too as we wouldn't need to allocate more storage space in order to let them log in.

benjimin commented 2 months ago

I found it tricky to replicate.

Simply doing something like dd if=/dev/zero of=~/bigfile bs=1mb, potentially after removing some of the contents that the sync will attempt to replace, usually only resulted in kernels unable to start not spawn failures. (I only got one spawn failure in several tries; it wasn't readily reproducible.)