Momentlabs / athenaeum

Jupyter Notebooks as a service
1 stars 0 forks source link

How do we handle the storage of notebooks files beyond the life of the cluster? #1

Open jdrivas opened 6 years ago

jdrivas commented 6 years ago

PersistentVolumes live for the life of the cluster, if we need longer life than that we'll need an alternative.

jdrivas commented 6 years ago

This issue was moved to Momentlabs/athenaeum-operations#1

jdrivas commented 6 years ago

At least we should be able to do something with Snapshots, if not some more complex storage mechanism.

jdrivas commented 6 years ago

The current plan is the following:

  1. Per user Notebook storage is on a GCP persistent disk.
  2. The disk is automatically created by a the creation of the user specific PVC (currently done in KubeSpawner), which creates a PV which and PD bound to the PV. the PVC binds to the newly created PVC
  3. The PVC is durable beyond Hub and Notebook lifecycles, but is bound to the Cluster. If the cluster goes away, the PVC and PV go a away, but not the PD (which is the actual storage).
  4. It is easy to attach an existing PD to a PV/PVC pair. See this note.
  5. The one caveat is that the PD needs to be in the same zone (if its zonal storage) as the cluster. Fortunately you can fairly easily create a PD from a snapshot of a PD.
  6. If you need to reattach an old PD to a user in a cluster:
    1. Find the relevant PD
    2. Take a snapshot and create a new PD in the right zone if necessary.
    3. Create a PV/PVC referencing the new PD using the PVC naming schema described by the config KubeSpawner.pvc_template.
    4. Restart the users singleuser-notebook (or rather shut it down, and have the hub restart it normally).

This notebook storage is durable beyond the cluster. However, there is no automation for reattaching the original PD to a users Notebook. The PD description field does have JSON that points back to the pvc and pv see this note.