Altinity / clickhouse-operator

Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse® clusters running on Kubernetes
https://altinity.com
Apache License 2.0
1.88k stars 460 forks source link

secret for inter server communication is not rolled out #1054

Closed vigodeltoro closed 1 year ago

vigodeltoro commented 1 year ago

Hi, I followed your examples to rollout a secret for inter server communication but it isn't rolled out:

Clickhouse Operator v0.20

That's my manifest:

#########
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "secret"
spec:
  configuration:
    settings:
      logger/level: "warning"
      compression/case/method: "lz4"
    users:
      # default/password: querty
      # default/profile: default
      # default/access_management: 0
      # default/networks/ip:
      #   - "::1"
      #   - "127.0.0.1"
      # default/k8s_secret_password: clickhouse/clickhouse-credentials/default
      admin/password: querty
      admin/profile: default
      admin/access_management: 1
      admin/networks/ip:
        - "::/0"
      admin/k8s_secret_password: clickhouse/clickhouse-credentials/admin
      superset/password: querty
      superset/profile: readonly
      superset/default_database: sisr_dev
      superset/allow_databases/database:
        - sisr_dev
      superset/readonly/readonly: 1
      superset/networks/ip:
        - "::/0"
      superset/k8s_secret_password: clickhouse/clickhouse-credentials/superset
      superset/max_memory_usage: "1000000000"

    zookeeper:
      nodes:
      - host: clickhouse-keeper
    clusters:
      - name: "auto"
        secret:
          auto: "True"
        layout:
          shardsCount: 2
          replicasCount: 1

  defaults:
    templates:
      podTemplate: pod-template
      dataVolumeClaimTemplate: data-volume-template
      logVolumeClaimTemplate: data-volume-template
      serviceTemplate: chi-service-template

  templates:
    serviceTemplates:
      - name: chi-service-template
        generateName: "clickhouse-{chi}"
        spec:
          securityContext:
            runAsUser: 101
            runAsGroup: 101
            fsGroup: 101
          ports:
            - name: http
              port: 8123
            - name: tcp
              port: 9000
            - name: interserver
              port: 9009
          type: ClusterIP
          ClusterIP: None
    podTemplates:
      - name: pod-template
        spec:
          securityContext:
            runAsUser: 101
            runAsGroup: 101
            fsGroup: 101
          containers:
            - name: clickhouse
              image: clickhouse/clickhouse-server:22.11

    volumeClaimTemplates:
      - name: data-volume-template
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1Gi
      - name: log-volume-template
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 100Mi

############

If I log in to a pod I see:

##########
clickhouse@chi-secret-auto-0-0-0:/etc/clickhouse-server/config.d$ cat chop-generated-```
remote_servers.xml 
<yandex>
    <remote_servers>
        <!-- User-specified clusters -->
        <auto>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>chi-secret-auto-0-0</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>chi-secret-auto-0-1</host>
                    <port>9000</port>
                </replica>
            </shard>
            <replica>
                    <host>chi-secret-auto-1-0</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>chi-secret-auto-1-0</host>
                    <port>9000</port>
                </replica>
            </shard>
        </auto>
    </remote_servers>
</yandex>

##########

Do I configure sth. wrong ? All versions ( plaintext, secret, auto ) didn't work for me.. Does anybody can help me out ?

Thanks a lot best regards

chancez commented 1 year ago

I think it's probably that with #1051 , the operator doesn't have permissions to create the secret, so it doesn't actually reconfigure clickhouse. I manually fixed the RBAC permissions in my environment so the operator can create secrets, and it's successfully configuring the nodes with the internode secret:

...
spec:
  containers:
  - env:
    - name: CLICKHOUSE_INTERNODE_CLUSTER_SECRET
      valueFrom:
        secretKeyRef:
          key: secret
          name: hubble-timescape-hubble-data-auto-secret
...
(⎈|kind-kind:default) ~/p/w/kind-cilium-ce-helm-install ❯❯❯ k exec -it -n hubble-timescape chi-hubble-timescape-hubble-data-0-0-0 bash                                                               ✘ 1 main ⬆ ✭ ✱ ◼
'bash-5.1# echo CLICKHOUSE_INTERNODE_CLUSTER_SECRETT
gNUPoNMpIjE
bash-5.1# grep -R CLICKHOUSE_INTERNODE_CLUSTER_SECRET /etc/clickhouse-server/
/etc/clickhouse-server/config.d/chop-generated-remote_servers.xml:            <secret from_env="CLICKHOUSE_INTERNODE_CLUSTER_SECRET" />
/etc/clickhouse-server/config.d/..2022_11_21_17_04_29.1261097742/chop-generated-remote_servers.xml:            <secret from_env="CLICKHOUSE_INTERNODE_CLUSTER_SECRET" />
/etc/clickhouse-server/config.d/..data/chop-generated-remote_servers.xml:            <secret from_env="CLICKHOUSE_INTERNODE_CLUSTER_SECRET" />
vigodeltoro commented 1 year ago

Hi chanchez.. thanks a lot.. I will have a look to that :) and try that out..

But I would expect that it works if I create the secret manually and use the secret reference function mentioned in

https://github.com/Altinity/clickhouse-operator/blob/0.20.0/docs/chi-examples/21-secure-cluster-secret-03-secret-ref.yaml ??

chancez commented 1 year ago

@vigodeltoro I'm not sure, because the operator is likely programmed to look for the secret when the cluster.secret options are present, and if it's failing on RBAC to lookup the secret, it gets wedged there trying to do the lookup forever.

vigodeltoro commented 1 year ago

Hi chancez .. ah.. that could be.. thanks. I will have a look for that and come back to you :)

vigodeltoro commented 1 year ago

Hi chancez, it took me some time to test.. because our Kubernetes admin was on holiday.. The operator role has the ability to get and list the secrets..

kubectl get clusterrole clickhouse-operator-clickhouse -o yaml

--- snap ---
 resources:
  - secrets
  verbs:
  - get
  - list
 --- snap ---

I tried it again in a cluster deployment with plaintext secret.. so I would guess that there shouldn't be a cluster secret necessary..but no success:

Config of cluster:
--- snap----
clusters:
    - name: "deployment-pv"
      secret:
          value: "plaintext"
      layout:
          shardsCount: 2
          replicasCount: 2
--- snap----

Config of pod

<remote_servers>
        <!-- User-specified clusters -->
        <deployment-pv>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>chi-pv-log-deployment-pv-0-0</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>chi-pv-log-deployment-pv-0-1</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>chi-pv-log-deployment-pv-1-0</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>chi-pv-log-deployment-pv-1-1</host>
                    <port>9000</port>
                </replica>
            </shard>
        </deployment-pv>

I would expect that:

<remote_servers>
        <!-- User-specified clusters -->
        <deployment-pv>
           <secret>plaintext</secret>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>chi-pv-log-deployment-pv-0-0</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>chi-pv-log-deployment-pv-0-1</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>chi-pv-log-deployment-pv-1-0</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>chi-pv-log-deployment-pv-1-1</host>
                    <port>9000</port>
                </replica>
            </shard>
        </deployment-pv>

Do you have any idea ? It would be very helpful if we could fix that.. at the moment I'am injecting the secret via configmap and restart the pods.. but that works very messy.. :(

best and thanks :)

vigodeltoro commented 1 year ago

I found the issue.. at my first tests I deployed the new Clickhouse-operator version.. but I didn't had a look to the new running pod. Because it was my last idea.. today I did.. for some reasons I don't know it respawned a old version ( 0.19.3, maybe from cluster cache.. )

Now it's working.. thanks for your help and sorry for wasting your time :/