Altinity / clickhouse-operator

Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse clusters running on Kubernetes
https://altinity.com
Apache License 2.0
1.86k stars 454 forks source link

POD annotations are dropped with the reconcile of CHK STS #1469

Open jirislav opened 1 month ago

jirislav commented 1 month ago

Keeping the POD annotations is essential to run the workload in EKS, where the fargate profile is the default one.

Dropping essential annotations, such as "eks.amazonaws.com/compute-type" = "ec2" will cause the POD to be unschedulable due to the fact that:

Example manifest:

apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
  name: clickhouse-keeper
spec:
  configuration:
    clusters:
      - name: chk
        layout:
          replicasCount: 3
  templates:
    podTemplates:
      - name: clickhouse-keeper
        metadata:
          annotations:
            eks.amazonaws.com/compute-type: "ec2"

Interestingly, first POD of the 3 replicas starts with correct annotation, but then, the second doesn't as the annotations are dropped from the underlying statefulset.

Note that I also see this in the log of the operator, which is possibly the result of this behavior:

E0802 07:02:22.310975       1 reconciler.go:299] err: Operation cannot be fulfilled on clickhousekeeperinstallations.clickhouse-keeper.altinity.com "chk": the object has been modified; please apply your changes to the latest version and try again
jirislav commented 1 month ago

Please see this pull request to the branch 0.24.0 🙏🏿 .

Kavinjsir commented 1 month ago

I encountered a similar issue when adding additional annotations for Datadog agent metrics scraping.

Here are the details:

  1. Defining annotations in the podTemplates for a CHK manifest works successfully when creating the CHK for the first time.
  2. However, modifying the annotations block later on causes the reconciliation process to drop all annotations.
g-marius commented 1 month ago

We are also randomly seeing reconciler errors on some deploys. since we are using annotations in our env, i would suspect it's the same issue as above mentioned for datadaog

 1 reconciler.go:299] err: Operation cannot be fulfilled on clickhousekeeperinstallations.clickhouse-keeper.altinity.com "keeper": the object has been modified; please apply your changes to the latest version and try again
Slach commented 6 days ago

@g-marius do you use something like Flux or ArgoCD?