apache / solr-operator

Official Kubernetes operator for Apache Solr
https://solr.apache.org/operator
Apache License 2.0
243 stars 112 forks source link

Deprecate zookeeper-operator #600

Open janhoy opened 10 months ago

janhoy commented 10 months ago

Ref discussions in https://github.com/apache/solr-operator/issues/517#issuecomment-1428310490

The ZK operator is a project that is not well maintained (74 open issues and no response to PRs). They lag in releases (on zk 3.7.1 while latest is 3.9.0) and you cannot even upgrade Zookeeper by referencing another ZK docker image, since the zk-operator require their own custom image with some added tools. This will be a problem in the future unless pravega ups their game one this.

Other issues with the architecture:

So let's consider declaring the zk-operator support deprecated. We can then either recommend users to use bitnami ZK chart separately and copy the ZK address. Or if possible, a tighter integration where you can just provide a namespace+name and let solr-operator figure out the connection string.

madrob commented 10 months ago

Do we have input from the zookeeper community about their preferred or recommended approach?

janhoy commented 10 months ago

Do we have input from the zookeeper community about their preferred or recommended approach?

Who/where do you want to ask? The Zookeeper project does not do any Docker stuff afaik. A volunteer publishes the official zookeeper Docker image on hub. There are various ZK helm charts out there, of which bitnami seems to be the best maintained one.

One option would of course be for Solr Operator to also take on managing the ZK cluster by crafting k8s specs for the sts, pod, svc, pdb, svcmonitor, cert etc resources needed for each Solr cluster. But I know @HoustonPutman is not very keen on that idea as the maintenance burden is too big. And I agree.

madrob commented 10 months ago

Who/where do you want to ask? The Zookeeper project does not do any Docker stuff afaik.

We would ask on their dev list. Maybe we need to ask them to reconsider their stance on no container development, I don't know the history behind that decision. Maybe their PMC is not aware of how bad the situation is in the community, and as part of the community of users we would be bringing new context to them. I don't know that I trust any of the third party volunteer supported tooling to exist and be up to date long term.

Of course, if their response is to tell us to go pound sand, then we are back in the same place we are now, but maybe there's a chance we get a better outcome.

cdmikechen commented 9 months ago

@janhoy I use https://github.com/strimzi/strimzi-kafka-operator to manage kafka cluster in k8s. Strimzi has built a unified image that includes both zookeeper and kafka. The user only needs to define a kafka resource, and the operator will create zookeeper first, and then kafka after zookeeper is running properly. I think solr-operator can also follow strimzi's idea and build zookeeper and solr in an image.

By the way, it seems that apache pulsar's current image is also an unified image (with bookkeeper).

janhoy commented 9 months ago

@cdmikechen What your're proposing is effectively for Solr Operator project to take on the whole responsibility of zookeeper-operator, and that is likely not something the project is willing or ready to do.

That's why I opened this issue. In the future there is hope for the core Solr project to take in a new zookeeper node-role in which case it makes more sense for solr-operator to manage those roles natively.

bergner commented 4 months ago

I ran into the issue of having ArgoCD get into a loop when trying to delete an ArgoCD application that used the Solr helm chart as one of its components, and the only workaround was to manually force delete the SolrCloud CR. ArgoCD has an annotation that can be set on resources ArgoCD should ignore (i.e. resources managed by something else):

argocd.argoproj.io/compare-options: IgnoreExtraneous

Looking at the helm chart configuration options at https://github.com/apache/solr-operator/tree/main/helm/solr I tried setting all the exposed "annotations" options accordingly in the values.yaml file used when deploying the Solr helm chart but this was still not sufficient.

image:
  tag: 8.11
replicas: 1
solrOptions:
  logLevel: "DEBUG"

fullnameOverride: solr

zk:
  provided:
    replicas: 1
    persistence:
      annotations:
        argocd.argoproj.io/compare-options: IgnoreExtraneous
    adminServerService:
      annotations:
        argocd.argoproj.io/compare-options: IgnoreExtraneous
    clientService:
      annotations:
        argocd.argoproj.io/compare-options: IgnoreExtraneous
    headlessService:
      annotations:
        argocd.argoproj.io/compare-options: IgnoreExtraneous
    zookeeperPodPolicy:
      annotations:
        argocd.argoproj.io/compare-options: IgnoreExtraneous
podOptions:
  serviceAccountName: solr-service-account
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
  podSecurityContext:
    runAsNonRoot: false
    seccompProfile:
      type: RuntimeDefault
  startupProbe:
    timeoutSeconds: 30
    periodSeconds: 10
statefulSetOptions:
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
commonServiceOptions:
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
headlessServiceOptions:
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
nodeServiceOptions:
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
ingressOptions:
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
configMapOptions:
  annotations:
    argocd.argoproj.io/compare-options: IgnoreExtraneous
dataStorage:
  type: persistent
  capacity: "20Gi"
  persistent:
    reclaimPolicy: "Delete"
    pvc:
      name: "solr-data"
      annotations:
        solrapp: "solr-data"
        argocd.argoproj.io/compare-options: IgnoreExtraneous
      labels:
        solrapp: "solr-data"

Looking at the resources underneath the SolrCloud CR I found the following that lacked the IgnoreExtraneous annotation:

Then there were a few other resources that also lacked it but I think these are ok without it (based on previous experience with ArgoCD and other operators):

I also think the Bitnami Zookeeper chart would be a better choice here. I haven't used the Bitnami ZK chart myself but I have had a pretty good experience using some containers and other charts from Bitnami in the past and the Bitnami ZK chart has good options that seemingly makes it easy to set annotations on all resources:

commonAnnotations | Add annotations to all the deployed resources | {}

janhoy commented 4 months ago

This issue would mainly be to add DEPRECATION warning to our documentation and also log files that the zk-operator option is deprecated and will be removed in the future. Is this a way we want to move?