Operator fails to update status due to size

tanner-bruce commented 4 months ago

clickhouse-operator E0619 08:13:59.022232       1 controller.go:748] updateCHIObjectStatus():clickhouse/clickhouse-production/55d39e29-fdf9-43b3-9d82-51d7aeefb7dc:got error, all retries are exhausted. err: "rpc error: code = ResourceExhausted desc = trying to send message larger than max (3055012 vs. 2097152)"

We have a fairly large single CHI and are getting this error in the operator now

A large chunk of the status section is the storage.xml from normalizedCompleted, around 420000 bytes of the 2097152 max.

tanner-bruce commented 4 months ago

This also seems to affect sections of the code like https://github.com/Altinity/clickhouse-operator/blob/0.24.0/pkg/model/chop_config.go#L151 where the status is used to compare old and new, causing unnecessary restarts.

One potential solution could be allowing configurable storage, and allowing object storage to be used for state storage rather than the k8s status

ondrej-smola commented 4 months ago

One workaround can be to bake some of your configuration into clickhouse image itself - reducing status size under max object size allowed by api-server

alex-zaitsev commented 4 months ago

@tanner-bruce , is it possible to share full CHI? I can see you are using configuration on shard level -- what is a reason for that? Maybe the CHI can be made more compact.

tanner-bruce commented 4 months ago

@alex-zaitsev I'm not sure what shard level configuration. We have some different node types and then we have different disk sizes for some clusters.

@ondrej-smola That is a good idea, we could for sure do the storage xml, but I think that is it

Currently we are looking at splitting up the different clusters into their own CHI and then use cluster discovery to link them to our query pods, but not sure how to migrate to that.

Here is our full CHI, redacted mildly.

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "clickhouse"
spec:
  configuration:
    profiles:
      clickhouse_operator/skip_unavailable_shards: 1
      materialize_ttl_after_modify: 0
      default/skip_unavailable_shards: 1
      readonly/readonly: 1
    settings:
      async_insert_threads: 30
      background_common_pool_size: 24
      background_distributed_schedule_pool_size: 24
      background_move_pool_size: 12
      background_pool_size: 36
      background_schedule_pool_size: 24
      logger/level: debug
      max_table_size_to_drop: 0
      prometheus/asynchronous_metrics: true
      prometheus/endpoint: /metrics
      prometheus/events: true
      prometheus/metrics: true
      prometheus/port: "8888"
      prometheus/status_info: true
    clusters:
      - name: "7"
        layout:
          shardsCount: 14
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-7
      - name: "6"
        layout:
          shardsCount: 14
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-6
      - name: "5"
        layout:
          shardsCount: 14
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-5
          podTemplate: ingest-2-pod-template
      - name: "4"
        layout:
          shardsCount: 8
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-4
      - name: "3"
        layout:
          shardsCount: 8
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-3
      - name: "2"
        layout:
          shardsCount: 2
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-2
      - name: "1"
        layout:
          shardsCount: 1
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-1
      - name: "query"
        templates:
          clusterServiceTemplate: query-service
          dataVolumeClaimTemplate: query-data
          podTemplate: query-pod-template
        layout:
          shardsCount: 4
          replicasCount: 1
    files:
      conf.d/storage.xml: "<clickhouse> <storage_configuration> <disks> <gcs> <type>s3</type> <access_key_id from_env=\"GCS_ACCESS_KEY\" /> <secret_access_key from_env=\"GCS_SECRET_KEY\" /> <endpoint from_env=\"GCS_ENDPOINT\" /> <metadata_path>/var/lib/clickhouse/disks/gcs/</metadata_path> <support_batch_delete>false</support_batch_delete> </gcs> <gcs_6m> <type>s3</type> <access_key_id from_env=\"GCS_ACCESS_KEY\" /> <secret_access_key from_env=\"GCS_SECRET_KEY\" /> <endpoint from_env=\"GCS_ENDPOINT_6M_RETENTION\" /> <metadata_path>/var/lib/clickhouse/disks/gcs_6m/</metadata_path> <support_batch_delete>false</support_batch_delete> </gcs_6m> <gcs_1y> <type>s3</type> <access_key_id from_env=\"GCS_ACCESS_KEY\" /> <secret_access_key from_env=\"GCS_SECRET_KEY\" /> <endpoint from_env=\"GCS_ENDPOINT_1Y_RETENTION\" /> <metadata_path>/var/lib/clickhouse/disks/gcs_1y/</metadata_path> <support_batch_delete>false</support_batch_delete> </gcs_1y> <gcs_cache> <type>cache</type> <disk>gcs</disk> <cache_enabled>true</cache_enabled> <data_cache_enabled>true</data_cache_enabled> <enable_filesystem_cache>true</enable_filesystem_cache> <path>/var/lib/clickhouse/disks/gcscache/</path> <enable_filesystem_cache_log>true</enable_filesystem_cache_log> <max_size>10Gi</max_size> </gcs_cache> <gcs_6m_cache> <type>cache</type> <disk>gcs_6m</disk> <cache_enabled>true</cache_enabled> <data_cache_enabled>true</data_cache_enabled> <enable_filesystem_cache>true</enable_filesystem_cache> <path>/var/lib/clickhouse/disks/gcscache_6m/</path> <enable_filesystem_cache_log>true</enable_filesystem_cache_log> <max_size>10Gi</max_size> </gcs_6m_cache> <gcs_1y_cache> <type>cache</type> <disk>gcs_1y</disk> <cache_enabled>true</cache_enabled> <data_cache_enabled>true</data_cache_enabled> <enable_filesystem_cache>true</enable_filesystem_cache> <path>/var/lib/clickhouse/disks/gcscache_1y/</path> <enable_filesystem_cache_log>true</enable_filesystem_cache_log> <max_size>95Gi</max_size> </gcs_1y_cache> <ssd> <type>local</type> <path>/var/lib/clickhouse/disks/localssd/</path> </ssd> </disks> <policies> <storage_main> <volumes> <ssd> <disk>ssd</disk> </ssd> <gcs> <disk>gcs_cache</disk> <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert> <prefer_not_to_merge>true</prefer_not_to_merge> </gcs> <gcs_6m> <disk>gcs_6m_cache</disk> <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert> <prefer_not_to_merge>true</prefer_not_to_merge> </gcs_6m> <gcs_1y> <disk>gcs_1y_cache</disk> <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert> <prefer_not_to_merge>true</prefer_not_to_merge> </gcs_1y> </volumes> <move_factor>0.1</move_factor> </storage_main> </policies> </storage_configuration> </clickhouse>"
    zookeeper:
      nodes:
        - host: clickhouse-keeper
          port: 2181
      session_timeout_ms: 30000
      operation_timeout_ms: 10000
      root: /root
      identity: user:password
    users:
      migrations/access_management: 1
      migrations/k8s_secret_password: clickhouse/clickhouse
      migrations/networks/ip: "::/0"
      exporter/k8s_secret_password: clickhouse/clickhouse
      exporter/networks/ip: "::/0"
      grafana/k8s_secret_password: clickhouse/clickhouse
      grafana/networks/ip: "::/0"
      grafana/grants/query:
        - GRANT SELECT ON *.*
        - REVOKE ALL PRIVILEGES ON .
        - REVOKE ALL PRIVILEGES ON .
        - REVOKE ALL PRIVILEGES ON .
        - REVOKE ALL PRIVILEGES ON .
        - REVOKE ALL PRIVILEGES ON .
        - REVOKE ALL PRIVILEGES ON .
      api/k8s_secret_password: clickhouse/clickhouse
      api/networks/ip: "::/0"
  defaults:
    templates:
      logVolumeClaimTemplate: logs
      podTemplate: ingest-pod-template
      serviceTemplate: default-service
      clusterServiceTemplate: cluster-ingest-service
    storageManagement:
      reclaimPolicy: Retain
  templates:
    serviceTemplates:
      - name: default-service
        generateName: clickhouse-{chi}
        metadata:
          annotations:
            networking.gke.io/load-balancer-type: "Internal"
            networking.gke.io/internal-load-balancer-allow-global-access: "true"
        spec:
          ports:
            - name: http
              port: 8123
            - name: tcp
              port: 9000
          type: LoadBalancer
      - name: cluster-ingest-service
        generateName: ingest-{cluster}-{chi}
        metadata:
          annotations:
            networking.gke.io/load-balancer-type: "Internal"
            networking.gke.io/internal-load-balancer-allow-global-access: "true"
        spec:
          ports:
            - name: http
              port: 8123
            - name: tcp
              port: 9000
          type: LoadBalancer
      - name: query-service
        generateName: query-{chi}
        metadata:
          annotations:
            networking.gke.io/load-balancer-type: "Internal"
            networking.gke.io/internal-load-balancer-allow-global-access: "true"
        spec:
          ports:
            - name: http
              port: 8123
            - name: tcp
              port: 9000
          type: LoadBalancer
    podTemplates:
      - name: ingest-pod-template
        metadata:
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/schema: "http"
            prometheus.io/port: "8888"
            prometheus.io/path: "/metrics"
        spec:
          tolerations:
          - key: "app"
            operator: "Equal"
            value: "ingest"
            effect: "NoExecute"
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: cloud.google.com/gke-nodepool
                    operator: In
                    values:
                    - ingest
          containers:
          - env:
            - name: SHARD_BUCKET_PATH
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: CLUSTER
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['clickhouse.altinity.com/cluster']
            - name: GCS_ENDPOINT
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            - name: GCS_ENDPOINT_6M_RETENTION
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            - name: GCS_ENDPOINT_1Y_RETENTION
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            envFrom:
            - secretRef:
                name: clickhouse
            image: clickhouse-server:24.5.1.1763
            name: clickhouse
            startupProbe:
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              failureThreshold: 100
              periodSeconds: 9
              timeoutSeconds: 1
            livenessProbe:
              failureThreshold: 100
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              initialDelaySeconds: 60
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 1
            readinessProbe:
              failureThreshold: 300
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              initialDelaySeconds: 10
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 1
            ports:
              - name: "metrics"
                containerPort: 8888
            resources:
              limits:
                memory: 10Gi
              requests:
                cpu: 1000m
                memory: 10Gi
            volumeMounts:
            - name: cache
              mountPath: /var/lib/clickhouse/disks/gcscache
            - name: cache-6m
              mountPath: /var/lib/clickhouse/disks/gcscache_6m
            - name: cache-1y
              mountPath: /var/lib/clickhouse/disks/gcscache_1y
      - name: ingest-2-pod-template
        metadata:
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/schema: "http"
            prometheus.io/port: "8888"
            prometheus.io/path: "/metrics"
        spec:
          tolerations:
          - key: "app"
            operator: "Equal"
            value: "ingest-2"
            effect: "NoExecute"
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: cloud.google.com/gke-nodepool
                    operator: In
                    values:
                    - ingest-2
          containers:
          - env:
            - name: SHARD_BUCKET_PATH
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: CLUSTER
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['clickhouse.altinity.com/cluster']
            - name: GCS_ENDPOINT
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            - name: GCS_ENDPOINT_6M_RETENTION
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            - name: GCS_ENDPOINT_1Y_RETENTION
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            envFrom:
            - secretRef:
                name: clickhouse
            image: clickhouse-server:24.5.1.1763
            name: clickhouse
            startupProbe:
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              failureThreshold: 100
              periodSeconds: 9
              timeoutSeconds: 1
            livenessProbe:
              failureThreshold: 100
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              initialDelaySeconds: 60
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 1
            readinessProbe:
              failureThreshold: 300
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              initialDelaySeconds: 10
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 1
            ports:
              - name: "metrics"
                containerPort: 8888
            resources:
              limits:
                memory: 10Gi
              requests:
                cpu: 1000m
                memory: 10Gi
            volumeMounts:
            - name: cache
              mountPath: /var/lib/clickhouse/disks/gcscache
            - name: cache-6m
              mountPath: /var/lib/clickhouse/disks/gcscache_6m
            - name: cache-1y
              mountPath: /var/lib/clickhouse/disks/gcscache_1y
      - name: query-pod-template
        metadata:
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/schema: "http"
            prometheus.io/port: "8888"
            prometheus.io/path: "/metrics"
        spec:
          tolerations:
          - key: "app"
            operator: "Equal"
            value: "ingest"
            effect: "NoExecute"
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: cloud.google.com/gke-nodepool
                    operator: In
                    values:
                    - ingest
          containers:
          - env:
            - name: SHARD_BUCKET_PATH
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: CLUSTER
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['clickhouse.altinity.com/cluster']
            - name: GCS_ENDPOINT
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            - name: GCS_ENDPOINT_6M_RETENTION
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            - name: GCS_ENDPOINT_1Y_RETENTION
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            envFrom:
            - secretRef:
                name: clickhouse
            image: clickhouse-server:24.5.1.1763
            name: clickhouse
            startupProbe:
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              failureThreshold: 40
              periodSeconds: 3
              timeoutSeconds: 1
            livenessProbe:
              failureThreshold: 10
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              initialDelaySeconds: 60
              periodSeconds: 3
              successThreshold: 1
              timeoutSeconds: 1
            readinessProbe:
              failureThreshold: 3
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              initialDelaySeconds: 10
              periodSeconds: 3
              successThreshold: 1
              timeoutSeconds: 1
            ports:
              - name: "metrics"
                containerPort: 8888
            resources:
              limits:
                memory: 10Gi
              requests:
                cpu: 1000m
                memory: 10Gi
    volumeClaimTemplates:
    - name: data-7
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd    
    - name: data-6
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd
    - name: data-5
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
    - name: data-4
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
          storageClassName: ssd
    - name: data-3
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
          storageClassName: ssd
    - name: data-2
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
          storageClassName: ssd
    - name: data-1
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
          storageClassName: ssd
    - name: query-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd
    - name: cache
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd
    - name: cache-6m
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd
    - name: cache-1y
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd
    - name: logs
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi

ondrej-smola commented 4 months ago

I would definitely consider moving to multiple CHI objects and make shared configuration generated by some (git)ops tool . If I understand it correctly it started after adding clusters 5, 6 and 7?

alex-zaitsev commented 3 months ago

@tanner-bruce , did it help after splitting clusters to multiple CHIs?

Altinity / clickhouse-operator

Operator fails to update status due to size #1444