Closed ThomasThelen closed 2 years ago
This PR should be ready for review (see my note about testing). There's a followup PR, #48 that's forked off of this branch. It has some nice features added to the deployment and documentation; after reading through this PR I suggest taking a look at the other changes for a bigger picture view.
Thanks @ThomasThelen! I appreciate you coming up to speed on how this service works and getting everything deploying correctly.
This PR got kinda too big to give an effective review but since we ran through things together last week, I'm going to merge and we can go from there.
I did do a scan and I two things did pop up:
--prod
which help get around that but I think the best overall solution is to have all connections to default to localhost with default ports and override any hostnames and ports with environmental variables (ala 12F). We can also work on that post-merge.Oh, and Re: Testing
You should be able to docker compose up -d
and run pytest
to run the full test suite. I did have to do some work to get the binding installed and I wrote up docs for my future self at https://github.com/DataONEorg/slinky/blob/develop/d1lod/docs/installing-redlands-bindings.md.
This PR addresses a few issues: #42 #34 #35 which are centered around applying a new triple pattern across the data and updating virtuoso.
Changes
Image Management
Previously, the
docker/
folder acted as the place to build the scheduler and worker images. Because the workers and scheduler depend on the d1lod library, and because Dockerfiles can only copy same-directory or lower files, the d1lod folder had to be copied into each folder. Additionally, there were separate images for the workers and scheduler. This was error prone and tedious.The changes in 16c9632 and ae271ea install the d1lod package into the d1lod image (which I have at thomasthelen/slinky). This image is used in the workers and scheduler, getting rid of the need for their separate images.
Worker Changes
The worker deployment has changed by deploying separate workers, one for the default queue and the other for the dataset queue. These changes are in 4b78c5c.
Scheduler Changes
In 3051b99dbf5020f5983f057249ddc597981ffeaa the scheduler deployment was updated to use the slinky cli.
Virtuoso Changes
Instead of having a second container interact with Virtuoso to enable updating, I combined the openlink virtuoso image with the shell script to enable updates. The virtuoso server an the update-enabler are run in separate processes, shown below. This was done in 913791c
Networking Changes
When pods in separate deployments need to communicate, they have to go through the associated service. Services can be referenced the same way in docker (ie 'redis' or 'virtuoso'). 50aa722 has a few changes to accommodate this requirement.
Misc
832c7a2 changes the name of the redis deployment to just 'redis'. We don't have a need to distinguish redis deployments, and this makes it more aligned with current names.
8e03b92 Fixes a bug where, without init.py pip won't include the contents of the folder in the install package path.
Testing
Kubernetes
A good kubbernetes test is to tear down the stack and then bring it back up and check that Virtuoso has triples.
kubectl config use-context slinky
kubectl delete deployment scheduler redis virtuoso worker-default worker-dataset
kubectl apply -f virtuoso-deployment.yaml
kubectl apply -f redis-deployment.yaml
kubectl apply -f virtuoso-deployment.yaml
kubectl apply -f worker-default-deployment.yaml
kubectl apply -f virtuoso-dataset-deployment.yaml
kubectl apply -f scheduler-deployment.yaml
kubectl proxy
An optional extra step after deployment is viewing the logs for the workers to see if any jobs have been processed.
Unit Tests
The unit tests should be run the usual way. Without the RDF module locally I run into issues running the tests locally; the requirement for running virtuoso or blazegraph alongside them complicated things (docker in docker territory).
I was able to run the non-networking tests which seemed to pass okay, including the local triple store tests. @amoeba is there any way you can run these on your local machine to test out?