DataONEorg / slinky

Slinky, the DataONE Graph Store
Apache License 2.0
4 stars 4 forks source link

Deployment Upgrades #48

Closed ThomasThelen closed 2 years ago

ThomasThelen commented 2 years ago

This PR should be reviewed after #43; once 43 is merged the duplicate commits should be hidden from this PR.

This PR has a few changes to the kubernetes deployment to improve reliability and deployment ease.

ReadinessProbes

Fixes #47

I added two ReadinessProbe configurations: one to redis and the other to virtuoso (commits a266898454aa7194e7d360c74f39667e2d090d17 and 6d6306cbaf77acdd49df9de54905e2f112c506bb). This allows us, at the deployment level to really know when either of these are ready for use by other pods.

The only thing that I'm not totally satisfied with here is that the Virtuoso probe tests to see if Virtuoso is online. There's one step between Virtuoso coming online and being ready that isn't being accounted for-the enable update step. Since the process that enabled the sparql update is also waiting for virtuoso to come online, it executes with a 1 second delay (this number comes from the enable-query shell script) so it's probably unlikely to cause any issues.

An alternative approach is to have the sparql-update script write to the filesystem when complete. K8 ReadinessProbe supports checking if files exists. Since the file will only exist when virtuoso is online and has sparql updates enabled it could be a reliable check. I didn't go with this method because it start to get a little convoluted and the code would start to be bound to deployment logic.

Ordered Deployment

Fixes #47

I added a deploy script for restarting deployments and for creating new ones. These should come in handy for any developers working on Slinky (restarting deployments is common). It should also reduce the chance of race conditions while deploying on production now that deployments wait for each other.

Completed in 5f15ce7

make start: Starts the deployments

Screen Shot 2021-11-06 at 9 04 35 AM

make stop: Deletes the deployments

Screen Shot 2021-11-06 at 9 02 35 AM

make restart: Restarts the deployment

Screen Shot 2021-11-06 at 9 06 35 AM

CephFS

Fixes #46 completed in 97c5441

The deployment is no longer using slinky-pv and slinky-pvc, which I've deleted off of the development server. Instead, the new cephfs definitions are in place.

Deployment Documentation

Fixes #36

I added some much needed kubernetes documentation around the deployment in 72ddb38. It also includes updates around deploying with the makefile.

Dockerhub Account Change

To clear up issue #49 I created a new account on dockerhub and google, slinkythegraph. The google account has my work email as a secondary address in case the credentials are lost. In f58aaff I went ahead and changed the deployment files to use the new location.

Testing

The main thing to test is the new deployment makefile.

  1. Checkout this branch
  2. Navigate to deploy/
  3. Run make help
  4. Confirm that there's enough information displayed
  5. Run make stop to stop any potential deployments
  6. Run make start to start a new slinky stack deployment
  7. Confirm that the deployments wait for each other before continuing
  8. Run make restart
  9. Confirm the stack is torn down and brought back up

To test CephFS:

The goal of these tests are to confirm that the deployment is mapped in the yaml files and that triples are produced and insterted into Virtuoso

  1. Switch to the dev-k8s user
  2. Delete the slinky cephfs PV
  3. kubectl apply -f deployment/slinky-pv.yaml
  4. Switch to the dev-slinky user
  5. kubectl apply -f deployment/slinky-pvc.yaml
  6. make restart
  7. kubectl proxy
  8. Visit http://127.0.0.1:8001/api/v1/namespaces/slinky/services/virtuoso:virtuoso/proxy/sparql
  9. Run a sparql query to confirm slinky triples

To test docs:

The goal here is to construct a mental picture of

  1. The deployment/pod landscape
  2. The interaction between pods
  3. Deployment

If there are any questions or unclear concepts in the documentation I can extent the docs further in these areas.

amoeba commented 2 years ago

Thanks for the changes, @ThomasThelen. The added docs and dockerhub account change look good.

Re: CephFS: I see a PV def in there but I was expecting not to since we had been talking about dynamic PVs. Were you thinking we could do this in the future and that we wouldn't need a manually-defined PV? We can follow up about this post-merge.

ThomasThelen commented 2 years ago

Good point @amoeba. I'm going to leave #46 open for the moment while this gets sorted out.