DataONEorg / slinky

Slinky, the DataONE Graph Store
Apache License 2.0
4 stars 4 forks source link

Convert to a Helm Chart #52

Open ThomasThelen opened 2 years ago

ThomasThelen commented 2 years ago

Right now deployment is handled through a make file. It would be great if developers could use the same tooling across all kubernetes deployments, like helm.

One hoop to jump through is ordering the deployment process; we want some pods to start after others have started. Helm doesn't have an official way to specify an ordering of deployments, but it looks like there's at least one workaround.

One way is to use chart hooks. Since we have six deployments, we'll have six helm charts. It's possible to run a chart hook before a helm chart is deployed. We're also able to set the hook priority with a hook-weight. Since hooks with high priorities are deployed first, it can be used as an ordering mechanism. This unfortunately doesn't provide a means to tell the scheduler deployment to wait for the redis deployment (it only allows us to deploy redis first, and immediately the scheduler).

I think we might actually be able to attach an initContainer to the scheduler and worker that pings redis until it finally gets a response. Then the container will pass control to the slinky cli. Doing this, we should be able to start all deployments at once without using chart hooks.

Tasks

mbjones commented 2 years ago

Kubernetes is designed to be a "declarative" system for describing the state of a set of services (e.g., service X should exist). I think it might make sense to make the components robust with respect to services that are currently not yet up -- i.e., ordering shouldn't matter. This might be enabled by having, for example, the deployment that depends on redis do error checking such that, if redis is not available, the deployment backs off and tries again in a bit. This will also help if, for example, kubernetes decides to move the redis service to another node, and there is a brief period where it is unavailable. From a 12 factor perspective, I take guidance from ideas like "resources can be attached to and detached from deploys at will.", and "processes are disposable, meaning they can be started or stopped at a moment’s notice."

amoeba commented 2 years ago

I'm going to start in on this today since the current set of YAML files don't fully deploy Slinky to a local cluster (they depend on our cluster, specifically cephfs) or our cluster (make doesn't apply Service defs or set up ingress).

amoeba commented 2 years ago

I pushed a first pass Helm chart to develop under /helm, see https://github.com/DataONEorg/slinky/tree/develop/helm. This installs and the service runs correctly (starts processing). A note for anyone trying to build this, it needs access to a 0.3.0 slinky image I've only built locally and made available on my dev cluster (which you can build too), and pushing those images up is a next order of business. There are some TODOs that need be addressed /helm folder too.

mbjones commented 2 years ago

Thanks, @amoeba -- I've been pretty happy with the use of GitHub Actions to build and push an image to the GitHub package repository associated with the associated repository -- that makes it easy to automate, easy to find the packages, they can be downloaded into any cluster, and we control the space so we can trust the images. They are public, so one has to watch to not commit secrets, etc. I have some examples in the bookkeeper and purser helm charts, among others.

mbjones commented 2 years ago

Also, I found that the use of sub-charts was really effective to get dependent services installed (e.g., rabbitmq) in our dataone-indexer refactor -- see https://github.com/DataONEorg/dataone-indexer/blob/feature-9-rabbitmq-worker/helm/Chart.yaml

amoeba commented 2 years ago

I've got a working version of the Helm chart installed on the dev cluster, available at https://api.test.dataone.org/slinky. Note: The query editor on that page works but returns a static result for now (more work needed there). Query editor should work now.

; helm install -n slinky \
          --set ingress.enabled=true \
          --set ingress.host=api.test.dataone.org \
          --set ingress.tls.secretName=ingress-nginx-tls-cert \
          --set ingress.clusterIssuer=letsencrypt-prod \
      slinky .
NAME: slinky
LAST DEPLOYED: Tue Aug  2 19:05:12 2022
NAMESPACE: slinky
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
-------------------------------------------------------------------------------
     _ _       _
 ___| (_)_ __ | | ___   _
/ __| | | '_ \| |/ / | | |
\__ \ | | | | |   <| |_| |
|___/_|_|_| |_|_|\_\\__, |
                    |___/
-------------------------------------------------------------------------------
version: 0.3.0

** Please be patient while the chart is being deployed and services are available **
You can check their status with kubectl get pods

Ingress enabled at https://api.test.dataone.org/slinky

There are three tasks I left undone: