Open tanner-bruce opened 4 months ago
This also seems to affect sections of the code like https://github.com/Altinity/clickhouse-operator/blob/0.24.0/pkg/model/chop_config.go#L151 where the status is used to compare old and new, causing unnecessary restarts.
One potential solution could be allowing configurable storage, and allowing object storage to be used for state storage rather than the k8s status
One workaround can be to bake some of your configuration into clickhouse image itself - reducing status size under max object size allowed by api-server
@tanner-bruce , is it possible to share full CHI? I can see you are using configuration on shard level -- what is a reason for that? Maybe the CHI can be made more compact.
@alex-zaitsev I'm not sure what shard level configuration. We have some different node types and then we have different disk sizes for some clusters.
@ondrej-smola That is a good idea, we could for sure do the storage xml, but I think that is it
Currently we are looking at splitting up the different clusters into their own CHI and then use cluster discovery to link them to our query pods, but not sure how to migrate to that.
Here is our full CHI, redacted mildly.
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "clickhouse"
spec:
configuration:
profiles:
clickhouse_operator/skip_unavailable_shards: 1
materialize_ttl_after_modify: 0
default/skip_unavailable_shards: 1
readonly/readonly: 1
settings:
async_insert_threads: 30
background_common_pool_size: 24
background_distributed_schedule_pool_size: 24
background_move_pool_size: 12
background_pool_size: 36
background_schedule_pool_size: 24
logger/level: debug
max_table_size_to_drop: 0
prometheus/asynchronous_metrics: true
prometheus/endpoint: /metrics
prometheus/events: true
prometheus/metrics: true
prometheus/port: "8888"
prometheus/status_info: true
clusters:
- name: "7"
layout:
shardsCount: 14
replicasCount: 2
templates:
dataVolumeClaimTemplate: data-7
- name: "6"
layout:
shardsCount: 14
replicasCount: 2
templates:
dataVolumeClaimTemplate: data-6
- name: "5"
layout:
shardsCount: 14
replicasCount: 2
templates:
dataVolumeClaimTemplate: data-5
podTemplate: ingest-2-pod-template
- name: "4"
layout:
shardsCount: 8
replicasCount: 2
templates:
dataVolumeClaimTemplate: data-4
- name: "3"
layout:
shardsCount: 8
replicasCount: 2
templates:
dataVolumeClaimTemplate: data-3
- name: "2"
layout:
shardsCount: 2
replicasCount: 2
templates:
dataVolumeClaimTemplate: data-2
- name: "1"
layout:
shardsCount: 1
replicasCount: 2
templates:
dataVolumeClaimTemplate: data-1
- name: "query"
templates:
clusterServiceTemplate: query-service
dataVolumeClaimTemplate: query-data
podTemplate: query-pod-template
layout:
shardsCount: 4
replicasCount: 1
files:
conf.d/storage.xml: "<clickhouse> <storage_configuration> <disks> <gcs> <type>s3</type> <access_key_id from_env=\"GCS_ACCESS_KEY\" /> <secret_access_key from_env=\"GCS_SECRET_KEY\" /> <endpoint from_env=\"GCS_ENDPOINT\" /> <metadata_path>/var/lib/clickhouse/disks/gcs/</metadata_path> <support_batch_delete>false</support_batch_delete> </gcs> <gcs_6m> <type>s3</type> <access_key_id from_env=\"GCS_ACCESS_KEY\" /> <secret_access_key from_env=\"GCS_SECRET_KEY\" /> <endpoint from_env=\"GCS_ENDPOINT_6M_RETENTION\" /> <metadata_path>/var/lib/clickhouse/disks/gcs_6m/</metadata_path> <support_batch_delete>false</support_batch_delete> </gcs_6m> <gcs_1y> <type>s3</type> <access_key_id from_env=\"GCS_ACCESS_KEY\" /> <secret_access_key from_env=\"GCS_SECRET_KEY\" /> <endpoint from_env=\"GCS_ENDPOINT_1Y_RETENTION\" /> <metadata_path>/var/lib/clickhouse/disks/gcs_1y/</metadata_path> <support_batch_delete>false</support_batch_delete> </gcs_1y> <gcs_cache> <type>cache</type> <disk>gcs</disk> <cache_enabled>true</cache_enabled> <data_cache_enabled>true</data_cache_enabled> <enable_filesystem_cache>true</enable_filesystem_cache> <path>/var/lib/clickhouse/disks/gcscache/</path> <enable_filesystem_cache_log>true</enable_filesystem_cache_log> <max_size>10Gi</max_size> </gcs_cache> <gcs_6m_cache> <type>cache</type> <disk>gcs_6m</disk> <cache_enabled>true</cache_enabled> <data_cache_enabled>true</data_cache_enabled> <enable_filesystem_cache>true</enable_filesystem_cache> <path>/var/lib/clickhouse/disks/gcscache_6m/</path> <enable_filesystem_cache_log>true</enable_filesystem_cache_log> <max_size>10Gi</max_size> </gcs_6m_cache> <gcs_1y_cache> <type>cache</type> <disk>gcs_1y</disk> <cache_enabled>true</cache_enabled> <data_cache_enabled>true</data_cache_enabled> <enable_filesystem_cache>true</enable_filesystem_cache> <path>/var/lib/clickhouse/disks/gcscache_1y/</path> <enable_filesystem_cache_log>true</enable_filesystem_cache_log> <max_size>95Gi</max_size> </gcs_1y_cache> <ssd> <type>local</type> <path>/var/lib/clickhouse/disks/localssd/</path> </ssd> </disks> <policies> <storage_main> <volumes> <ssd> <disk>ssd</disk> </ssd> <gcs> <disk>gcs_cache</disk> <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert> <prefer_not_to_merge>true</prefer_not_to_merge> </gcs> <gcs_6m> <disk>gcs_6m_cache</disk> <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert> <prefer_not_to_merge>true</prefer_not_to_merge> </gcs_6m> <gcs_1y> <disk>gcs_1y_cache</disk> <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert> <prefer_not_to_merge>true</prefer_not_to_merge> </gcs_1y> </volumes> <move_factor>0.1</move_factor> </storage_main> </policies> </storage_configuration> </clickhouse>"
zookeeper:
nodes:
- host: clickhouse-keeper
port: 2181
session_timeout_ms: 30000
operation_timeout_ms: 10000
root: /root
identity: user:password
users:
migrations/access_management: 1
migrations/k8s_secret_password: clickhouse/clickhouse
migrations/networks/ip: "::/0"
exporter/k8s_secret_password: clickhouse/clickhouse
exporter/networks/ip: "::/0"
grafana/k8s_secret_password: clickhouse/clickhouse
grafana/networks/ip: "::/0"
grafana/grants/query:
- GRANT SELECT ON *.*
- REVOKE ALL PRIVILEGES ON .
- REVOKE ALL PRIVILEGES ON .
- REVOKE ALL PRIVILEGES ON .
- REVOKE ALL PRIVILEGES ON .
- REVOKE ALL PRIVILEGES ON .
- REVOKE ALL PRIVILEGES ON .
api/k8s_secret_password: clickhouse/clickhouse
api/networks/ip: "::/0"
defaults:
templates:
logVolumeClaimTemplate: logs
podTemplate: ingest-pod-template
serviceTemplate: default-service
clusterServiceTemplate: cluster-ingest-service
storageManagement:
reclaimPolicy: Retain
templates:
serviceTemplates:
- name: default-service
generateName: clickhouse-{chi}
metadata:
annotations:
networking.gke.io/load-balancer-type: "Internal"
networking.gke.io/internal-load-balancer-allow-global-access: "true"
spec:
ports:
- name: http
port: 8123
- name: tcp
port: 9000
type: LoadBalancer
- name: cluster-ingest-service
generateName: ingest-{cluster}-{chi}
metadata:
annotations:
networking.gke.io/load-balancer-type: "Internal"
networking.gke.io/internal-load-balancer-allow-global-access: "true"
spec:
ports:
- name: http
port: 8123
- name: tcp
port: 9000
type: LoadBalancer
- name: query-service
generateName: query-{chi}
metadata:
annotations:
networking.gke.io/load-balancer-type: "Internal"
networking.gke.io/internal-load-balancer-allow-global-access: "true"
spec:
ports:
- name: http
port: 8123
- name: tcp
port: 9000
type: LoadBalancer
podTemplates:
- name: ingest-pod-template
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/schema: "http"
prometheus.io/port: "8888"
prometheus.io/path: "/metrics"
spec:
tolerations:
- key: "app"
operator: "Equal"
value: "ingest"
effect: "NoExecute"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-nodepool
operator: In
values:
- ingest
containers:
- env:
- name: SHARD_BUCKET_PATH
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CLUSTER
valueFrom:
fieldRef:
fieldPath: metadata.labels['clickhouse.altinity.com/cluster']
- name: GCS_ENDPOINT
value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
- name: GCS_ENDPOINT_6M_RETENTION
value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
- name: GCS_ENDPOINT_1Y_RETENTION
value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
envFrom:
- secretRef:
name: clickhouse
image: clickhouse-server:24.5.1.1763
name: clickhouse
startupProbe:
httpGet:
path: /ping
port: http
scheme: HTTP
failureThreshold: 100
periodSeconds: 9
timeoutSeconds: 1
livenessProbe:
failureThreshold: 100
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 300
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 1
ports:
- name: "metrics"
containerPort: 8888
resources:
limits:
memory: 10Gi
requests:
cpu: 1000m
memory: 10Gi
volumeMounts:
- name: cache
mountPath: /var/lib/clickhouse/disks/gcscache
- name: cache-6m
mountPath: /var/lib/clickhouse/disks/gcscache_6m
- name: cache-1y
mountPath: /var/lib/clickhouse/disks/gcscache_1y
- name: ingest-2-pod-template
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/schema: "http"
prometheus.io/port: "8888"
prometheus.io/path: "/metrics"
spec:
tolerations:
- key: "app"
operator: "Equal"
value: "ingest-2"
effect: "NoExecute"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-nodepool
operator: In
values:
- ingest-2
containers:
- env:
- name: SHARD_BUCKET_PATH
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CLUSTER
valueFrom:
fieldRef:
fieldPath: metadata.labels['clickhouse.altinity.com/cluster']
- name: GCS_ENDPOINT
value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
- name: GCS_ENDPOINT_6M_RETENTION
value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
- name: GCS_ENDPOINT_1Y_RETENTION
value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
envFrom:
- secretRef:
name: clickhouse
image: clickhouse-server:24.5.1.1763
name: clickhouse
startupProbe:
httpGet:
path: /ping
port: http
scheme: HTTP
failureThreshold: 100
periodSeconds: 9
timeoutSeconds: 1
livenessProbe:
failureThreshold: 100
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 300
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 1
ports:
- name: "metrics"
containerPort: 8888
resources:
limits:
memory: 10Gi
requests:
cpu: 1000m
memory: 10Gi
volumeMounts:
- name: cache
mountPath: /var/lib/clickhouse/disks/gcscache
- name: cache-6m
mountPath: /var/lib/clickhouse/disks/gcscache_6m
- name: cache-1y
mountPath: /var/lib/clickhouse/disks/gcscache_1y
- name: query-pod-template
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/schema: "http"
prometheus.io/port: "8888"
prometheus.io/path: "/metrics"
spec:
tolerations:
- key: "app"
operator: "Equal"
value: "ingest"
effect: "NoExecute"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-nodepool
operator: In
values:
- ingest
containers:
- env:
- name: SHARD_BUCKET_PATH
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER
valueFrom:
fieldRef:
fieldPath: metadata.labels['clickhouse.altinity.com/cluster']
- name: GCS_ENDPOINT
value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
- name: GCS_ENDPOINT_6M_RETENTION
value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
- name: GCS_ENDPOINT_1Y_RETENTION
value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
envFrom:
- secretRef:
name: clickhouse
image: clickhouse-server:24.5.1.1763
name: clickhouse
startupProbe:
httpGet:
path: /ping
port: http
scheme: HTTP
failureThreshold: 40
periodSeconds: 3
timeoutSeconds: 1
livenessProbe:
failureThreshold: 10
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
ports:
- name: "metrics"
containerPort: 8888
resources:
limits:
memory: 10Gi
requests:
cpu: 1000m
memory: 10Gi
volumeClaimTemplates:
- name: data-7
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ssd
- name: data-6
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ssd
- name: data-5
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
- name: data-4
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ssd
- name: data-3
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ssd
- name: data-2
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ssd
- name: data-1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ssd
- name: query-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ssd
- name: cache
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ssd
- name: cache-6m
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ssd
- name: cache-1y
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ssd
- name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ssd
- name: logs
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
I would definitely consider moving to multiple CHI objects and make shared configuration generated by some (git)ops tool . If I understand it correctly it started after adding clusters 5, 6 and 7?
@tanner-bruce , did it help after splitting clusters to multiple CHIs?
We have a fairly large single CHI and are getting this error in the operator now
A large chunk of the status section is the storage.xml from
normalizedCompleted
, around 420000 bytes of the 2097152 max.