Open janhoy opened 10 months ago
Do we have input from the zookeeper community about their preferred or recommended approach?
Do we have input from the zookeeper community about their preferred or recommended approach?
Who/where do you want to ask? The Zookeeper project does not do any Docker stuff afaik. A volunteer publishes the official zookeeper Docker image on hub. There are various ZK helm charts out there, of which bitnami seems to be the best maintained one.
One option would of course be for Solr Operator to also take on managing the ZK cluster by crafting k8s specs for the sts, pod, svc, pdb, svcmonitor, cert etc resources needed for each Solr cluster. But I know @HoustonPutman is not very keen on that idea as the maintenance burden is too big. And I agree.
Who/where do you want to ask? The Zookeeper project does not do any Docker stuff afaik.
We would ask on their dev list. Maybe we need to ask them to reconsider their stance on no container development, I don't know the history behind that decision. Maybe their PMC is not aware of how bad the situation is in the community, and as part of the community of users we would be bringing new context to them. I don't know that I trust any of the third party volunteer supported tooling to exist and be up to date long term.
Of course, if their response is to tell us to go pound sand, then we are back in the same place we are now, but maybe there's a chance we get a better outcome.
@janhoy I use https://github.com/strimzi/strimzi-kafka-operator to manage kafka cluster in k8s. Strimzi has built a unified image that includes both zookeeper and kafka. The user only needs to define a kafka resource, and the operator will create zookeeper first, and then kafka after zookeeper is running properly. I think solr-operator can also follow strimzi's idea and build zookeeper and solr in an image.
By the way, it seems that apache pulsar's current image is also an unified image (with bookkeeper).
@cdmikechen What your're proposing is effectively for Solr Operator project to take on the whole responsibility of zookeeper-operator, and that is likely not something the project is willing or ready to do.
That's why I opened this issue. In the future there is hope for the core Solr project to take in a new zookeeper node-role in which case it makes more sense for solr-operator to manage those roles natively.
I ran into the issue of having ArgoCD get into a loop when trying to delete an ArgoCD application that used the Solr helm chart as one of its components, and the only workaround was to manually force delete the SolrCloud CR. ArgoCD has an annotation that can be set on resources ArgoCD should ignore (i.e. resources managed by something else):
argocd.argoproj.io/compare-options: IgnoreExtraneous
Looking at the helm chart configuration options at https://github.com/apache/solr-operator/tree/main/helm/solr I tried setting all the exposed "annotations" options accordingly in the values.yaml file used when deploying the Solr helm chart but this was still not sufficient.
image:
tag: 8.11
replicas: 1
solrOptions:
logLevel: "DEBUG"
fullnameOverride: solr
zk:
provided:
replicas: 1
persistence:
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
adminServerService:
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
clientService:
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
headlessService:
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
zookeeperPodPolicy:
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
podOptions:
serviceAccountName: solr-service-account
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
podSecurityContext:
runAsNonRoot: false
seccompProfile:
type: RuntimeDefault
startupProbe:
timeoutSeconds: 30
periodSeconds: 10
statefulSetOptions:
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
commonServiceOptions:
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
headlessServiceOptions:
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
nodeServiceOptions:
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
ingressOptions:
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
configMapOptions:
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
dataStorage:
type: persistent
capacity: "20Gi"
persistent:
reclaimPolicy: "Delete"
pvc:
name: "solr-data"
annotations:
solrapp: "solr-data"
argocd.argoproj.io/compare-options: IgnoreExtraneous
labels:
solrapp: "solr-data"
Looking at the resources underneath the SolrCloud CR I found the following that lacked the IgnoreExtraneous annotation:
Then there were a few other resources that also lacked it but I think these are ok without it (based on previous experience with ArgoCD and other operators):
I also think the Bitnami Zookeeper chart would be a better choice here. I haven't used the Bitnami ZK chart myself but I have had a pretty good experience using some containers and other charts from Bitnami in the past and the Bitnami ZK chart has good options that seemingly makes it easy to set annotations on all resources:
commonAnnotations | Add annotations to all the deployed resources | {}
This issue would mainly be to add DEPRECATION warning to our documentation and also log files that the zk-operator option is deprecated and will be removed in the future. Is this a way we want to move?
Ref discussions in https://github.com/apache/solr-operator/issues/517#issuecomment-1428310490
The ZK operator is a project that is not well maintained (74 open issues and no response to PRs). They lag in releases (on zk 3.7.1 while latest is 3.9.0) and you cannot even upgrade Zookeeper by referencing another ZK docker image, since the zk-operator require their own custom image with some added tools. This will be a problem in the future unless pravega ups their game one this.
Other issues with the architecture:
So let's consider declaring the zk-operator support deprecated. We can then either recommend users to use bitnami ZK chart separately and copy the ZK address. Or if possible, a tighter integration where you can just provide a namespace+name and let solr-operator figure out the connection string.