DataONEorg / k8s-cluster

Documentation on the DataONE Kubernetes cluster
Apache License 2.0
2 stars 1 forks source link

Add slinky pv configuration files #2

Closed ThomasThelen closed 3 years ago

ThomasThelen commented 3 years ago

This is a PR to open the discussion of handling persistent volume configurations. It includes three K8 configuration files-one for each volume in the slinky deployment.

File Organization

I've listed a few alternative ways to organize the files in this directory. The most straightforward way to install each volume is kubectl apply -f <filename>.

Single Project Configuration File

In this approach, each project has a single yaml file that contains all of the persistent volume definitions. For example, instead of the three files in this PR, there would be a single slinky-pv.yaml that contains the three pv declerations.

Unique PV Configuration File

In this approach, each persistent volume has its own file. This is the method that I picked for this PR. One extension is to create a directory for each deployment (ie bookkeeper/, slinky/) and apply this convention there (sounds over complex to me). I liked this method because it allows you to run kubectl apply, patch, update on individual definitions which is either not possible or more involved when bundling many definitions in a single file.

Single Configuration File

An alternative is to have a single file, volumes.yaml that has all of the volume declarations inside.

Relation to Automated Deployments

Keeping the PV declarations as Kubernetes config files should integrate with a method for automating the instillation of these. I think that at the end of the day it will either be something like a shell script running kubectl apply for each file, or something much larger like a helm chart. Both cases should easily support what's been done in this PR.

mbjones commented 3 years ago

Thanks, Tommy.

I guess one issue to address is whether there really would be application-specific PVs. I think one of the concepts of k8s is that a sysadmin can create a PV with properties that allow mutliple applications to use it via a PVC. For example, both metacat and metadig might use the same PVC, one mounting it read-write (metacat) and one mouting it read-only (metadig) to give them access to the same files. Are you saying the PV would be named after the app that was intending to mount it read-write? I htink a single larger PV could also be used to support multiple PVCs. As the docs say, a PV has certain storage characteristics beyond size and access mode that would be requested in a PVC (e.g., such as SSD performance levels).

mbjones commented 3 years ago

ok, strike all that I just said. I am totally wrong in my thinking on this. Reading further, such as this Kubernetes storage overview, I see they conclude that:

In Kubernetes, one PV maps to one PVC, and vice versa.

Seems like we need to define our PV to use Ceph with a Retain mode, that there is only one PVC to claim that volume, and that the PVC can then be mounted in multiple access modes by different pods. It's important that we use Retain mode so that the data sticks around. So, really we expect to have one PV/PVC pair for each type of storage volume needed by an application. I'm doing some more reading on how we can share volumes across applications (e.g., the metacat/metadig example above).

mbjones commented 3 years ago

After further reading, it seems we would be best served by creating a StorageClass for our ceph storage that enables dynamic provisioning or persistent volumes. That eliminates the need to create individual PVs, and instead means that a PVC request will name the StorageClass it wants to use, resulting in a PV being created dynamically. See the overview here: https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/

This approach likely would mean that PVCs would be best stored in the application that needs them. Still more to figure out on the best approach to mounting the same PVC on multiple applications.

gothub commented 3 years ago

Until dynamic provisioning is setup, note that different pods can use the same persistent volume claim:

avatar:~ slaughter$ kubectl describe pvc nfs-pvc -n metadig
Name:          nfs-pvc
Namespace:     metadig
StorageClass:
Status:        Bound
Volume:        nfs-pv
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      500Gi
Access Modes:  RWX
VolumeMode:    Filesystem
Used By:       metadig-controller-5ddb6f78fd-h568t
               metadig-scheduler-7b8866889f-4wmcj
               metadig-scorer-6f48c7f45d-kpspc
               metadig-scorer-6f48c7f45d-lcrbp
               metadig-scorer-6f48c7f45d-m727m
               metadig-scorer-6f48c7f45d-swr9r
               metadig-scorer-6f48c7f45d-x4tkt
               metadig-worker-74585bf8b9-2xvhw
               metadig-worker-74585bf8b9-5zfxp
               metadig-worker-74585bf8b9-7bbnl
               metadig-worker-74585bf8b9-7fsl2
               metadig-worker-74585bf8b9-8gzxh
               metadig-worker-74585bf8b9-bgcbt
               metadig-worker-74585bf8b9-c2cdf
               metadig-worker-74585bf8b9-dlqs4
               metadig-worker-74585bf8b9-f2g4c
               metadig-worker-74585bf8b9-f82b7
               metadig-worker-74585bf8b9-f9pwv
               metadig-worker-74585bf8b9-fzt9x
               metadig-worker-74585bf8b9-g8v2p
               metadig-worker-74585bf8b9-hm9cc
               metadig-worker-74585bf8b9-hx7bm
               metadig-worker-74585bf8b9-lnb7b
               metadig-worker-74585bf8b9-m9hr5
               metadig-worker-74585bf8b9-nxhjt
               metadig-worker-74585bf8b9-zgpjl
               metadig-worker-74585bf8b9-zp9h9
               postgres-7d7c87c466-rhxpq
Events:        <none>
ThomasThelen commented 3 years ago

Closing this since thing have changed considerably since this PR was issued.