Closed piratebriggs closed 5 years ago
Hi @piratebriggs, thank again for your contribution. I'll appreciate your effort to move this configuration in a multi instances version (solr and zookeeper) I see that you changed solr-config.properties adding a zookeeper ensemble. This would break all existing configurations (minicube, Docker for Desktop, Google Cloud). I'll accept your contribution but I have to preserve existing configuration, we should find a way. What do you suggest?
@piratebriggs thanks for contribution. As said I have to preserve existing configuration for the others Kubernetes Deployment Models. I'll change just a little part of your contribution in order to let everything work.
Hi,
These changes are a result of a need to host a scaled solr cluster in AKS (Azure K8s). Not all of the changes will be relevant to you repo (e.g. ingress & service account) but I had to make a few changes to move from the single solr instance to three instances which I thought might be useful to you and future peeps who find this repo.
First off, I've not looked at the existing ZooKeeper stuff as I chose to use the canonical version from https://kubernetes.io/docs/tutorials/stateful-application/zookeeper/. This was the inspiration for a lot of the changes I've made in this repo. This spins up an ensemble of three nodes.
I'll quickly go through the interesting changes in this PR:
solr-config.properties
solrHost
key as I found out the hard way that scaled Solr cloud needs to access each of the nodes in the cluster - configuring a single host name for a service in this way causes solr to throw errors about Leaders and followers. More on this laterzkHost
needs to list all the nodes in the ensemble. Also found this out the hard way :)service-solr.yml
statefulset-solr.yml
replicas
set to threeupdateStrategy
,podManagementPolicy
&podAntiAffinity
are all taken from K8s ZooKeeper examplecontainerPort
this looked like a typo but didn't seem to have any ill effectsPOD_HOST_NAME
&SOLR_HOST
as mentioned above, the hostname of each solr node needs to be distinct and consistent as the nodes all need to be able to vote of the Leader of each collection. This approach allocates a hostname based on the headless service name plus an ordinal index. When a pod is restarted, it will start up with the same name as before alowing the cluster to recover.livenessProbe
There's a lot of examples of ppl using solr healthcheck as the livenessProbe but I found this to be less than ideal as it relates to the status of a collection as a whole. as soon as one pod is restarted, the collection will be affected and this then caused a cascade of failures across the stateful set. This simple http probe of the admin url of the local instance will identify when the Java host is not responding (such as happens when the node looses contact with ZK for example).volumeClaimTemplates
This is also taken from the K8s ZooKeeper example. The repo previously sets up a single PV and PVC; as soon as you scale this to more than once instance the pods can't start as the claim is already.... claimed? This template approach requests K8s to create as many PV and PVCs as there are pods and ensures that they get re-allocated to the correct pod on restart.I've included a shell script for creating my setup on AKS - without the ingress stuff, this should be re-usable. Let me know if you're interested in a tidier version. Also, It'll not work on minikube due to being a single node. Assuming AWS has a default storageClass configured, the volume claim template should work OK but I've only been focused on Azure.