Closed fidiego closed 2 weeks ago
could you share kubectl get chi -n clickhouse clickhouse-installation -o yaml
?
@Slach sure thing.
```yaml
apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"clickhouse.altinity.com/v1","kind":"ClickHouseInstallation","metadata":{"annotations":{},"name":"clickhouse-installation","namespace":"clickhouse"},"spec":{"configuration":{"clusters":[{"layout":{"replicasCount":3,"shardsCount":3},"name":"simple","templates":{"podTemplate":"clickhouse:24.3.7","serviceTemplate":"clickhouse:24.3.7"}}],"files":{"config.d/disks.xml":"\u003cclickhouse\u003e\n \u003cstorage_configuration\u003e\n \u003cdisks\u003e\n \u003cs3_disk\u003e\n \u003ctype\u003es3\u003c/type\u003e\n \u003cendpoint\u003ehttps://company-clickhouse-env.s3.amazonaws.com/tables/\u003c/endpoint\u003e\n \u003cuse_environment_credentials\u003etrue\u003c/use_environment_credentials\u003e\n \u003cmetadata_path\u003e/var/lib/clickhouse/disks/s3_disk/\u003c/metadata_path\u003e\n \u003c/s3_disk\u003e\n \u003cs3_cache\u003e\n \u003ctype\u003ecache\u003c/type\u003e\n \u003cdisk\u003es3_disk\u003c/disk\u003e\n \u003cpath\u003e/var/lib/clickhouse/disks/s3_cache/\u003c/path\u003e\n \u003cmax_size\u003e10Gi\u003c/max_size\u003e\n \u003c/s3_cache\u003e\n \u003c/disks\u003e\n \u003cpolicies\u003e\n \u003cs3_main\u003e\n \u003cvolumes\u003e\n \u003cmain\u003e\n \u003cdisk\u003es3_disk\u003c/disk\u003e\n \u003c/main\u003e\n \u003c/volumes\u003e\n \u003c/s3_main\u003e\n \u003c/policies\u003e\n \u003c/storage_configuration\u003e\n\u003c/clickhouse\u003e\n","config.d/s3.xml":"\u003cclickhouse\u003e\n \u003cs3\u003e\n \u003cuse_environment_credentials\u003etrue\u003c/use_environment_credentials\u003e\n \u003c/s3\u003e\n\u003c/clickhouse\u003e\n"},"users":{"username/networks/ip":["0.0.0.0/0"],"username/password_sha256_hex":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}},"defaults":{"templates":{"dataVolumeClaimTemplate":"data-volume-template","logVolumeClaimTemplate":"log-volume-template","serviceTemplate":"clickhouse:24.3.7"}},"templates":{"podTemplates":[{"name":"clickhouse:24.3.7","spec":{"containers":[{"env":[{"name":"CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS","value":"true"}],"image":"clickhouse/clickhouse-server:24.3.7","name":"clickhouse","volumeMounts":[{"mountPath":"/var/lib/clickhouse","name":"data-volume-template"},{"mountPath":"/var/log/clickhouse-server","name":"log-volume-template"},{"mountPath":"/docker-entrypoint-initdb.d","name":"bootstrap-configmap-volume"}]}],"nodeSelector":{"clickhouse-installation":"true"},"tolerations":[{"effect":"NoSchedule","key":"installation","operator":"Equal","value":"clickhouse-installation"}],"volumes":[{"configMap":{"name":"bootstrap-configmap"},"name":"bootstrap-configmap-volume"}]}}],"serviceTemplates":[{"metadata":{"annotations":{"external-dns.alpha.kubernetes.io/internal-hostname":"clickhouse.company.us-west-2.env.company.cloud","external-dns.alpha.kubernetes.io/ttl":"60"}},"name":"clickhouse:24.3.7","spec":{"ports":[{"name":"http","port":8123},{"name":"client","port":9000}]}}],"volumeClaimTemplates":[{"name":"data-volume-template","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"100Gi"}}}},{"name":"log-volume-template","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"4Gi"}}}}]}}}
creationTimestamp: "2024-08-21T13:49:55Z"
finalizers:
- finalizer.clickhouseinstallation.altinity.com
generation: 1
name: clickhouse-installation
namespace: clickhouse
resourceVersion: "415622675"
uid: 75211649-036d-4cb3-a535-d9a4bab082c0
spec:
configuration:
clusters:
- layout:
replicasCount: 3
shardsCount: 3
name: simple
templates:
podTemplate: clickhouse:24.3.7
serviceTemplate: clickhouse:24.3.7
files:
config.d/disks.xml: |
resources:
requests:
storage: 4Gi
maybe this is out of space?
could you share
kubectl get storageclass -o wide
?
I see. Easy fix, in that case.
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 2y135d
could you share?
kubectl get storageclass gp2 -o yaml
of course @Slach
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"gp2"},"parameters":{"fsType":
"ext4","type":"gp2"},"provisioner":"kubernetes.io/aws-ebs","volumeBindingMode":"WaitForFirstConsumer"}
storageclass.kubernetes.io/is-default-class: "true"
creationTimestamp: "2022-05-05T16:36:20Z"
name: gp2
resourceVersion: "325"
uid: 7249518f-xxxx-xxxx-xxxx-c150dc4feedb
parameters:
fsType: ext4
type: gp2
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
try to increase size for logs in log-volume-template
in volumeClaimTemplates
resources:
requests:
storage: 8Gi
or make log rotation more aggressive something like that
spec:
configuration:
settings:
logger/size: 1000M
logger/count: 3
Description
I have a
clickhouse-installation
version24.3.7
running on an EKS cluster. One of myStatefulset
s is stuck inCrashLoopBackOff
. Re-starting the pod has no effect. I am hesitant to delete it manually in case I lose the underlying volume causing data loss.What is the recommended course of action?
Error Logs
State
```txt Name: chi-clickhouse-installation-simple-1-1-0 Namespace: clickhouse Priority: 0 Service Account: default Node: ip-10-132-130-000.us-west-2.compute.internal/10.132.130.000 Start Time: Thu, 12 Sep 2024 16:38:00 -0500 Labels: apps.kubernetes.io/pod-index=0 clickhouse.altinity.com/app=chop clickhouse.altinity.com/chi=clickhouse-installation clickhouse.altinity.com/cluster=simple clickhouse.altinity.com/namespace=clickhouse clickhouse.altinity.com/ready=yes clickhouse.altinity.com/replica=1 clickhouse.altinity.com/shard=1 controller-revision-hash=chi-clickhouse-installation-simple-1-1-56cd87d759 statefulset.kubernetes.io/pod-name=chi-clickhouse-installation-simple-1-1-0 Annotations:
Status: Running
IP: 10.132.130.00
IPs:
IP: 10.132.130.00
Controlled By: StatefulSet/chi-clickhouse-installation-simple-1-1
Containers:
clickhouse:
Container ID: containerd://5a3ab0c1331f8e1dc25c610edcccbcdc059822a49c80b47f90e882af9e046471
Image: clickhouse/clickhouse-server:24.3.7
Image ID: docker.io/clickhouse/clickhouse-server@sha256:e55fe12eb1964663d5f4cb8b633444b6dfd1233124bd3cbbf1daacfa815d3c59
Ports: 9000/TCP, 8123/TCP, 9009/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Thu, 12 Sep 2024 20:39:14 -0500
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Thu, 12 Sep 2024 20:31:59 -0500
Finished: Thu, 12 Sep 2024 20:33:58 -0500
Ready: False
Restart Count: 56
Liveness: http-get http://:http/ping delay=60s timeout=1s period=3s #success=1 #failure=10
Readiness: http-get http://:http/ping delay=10s timeout=1s period=3s #success=1 #failure=3
Environment:
CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS: true
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_DEFAULT_REGION: us-west-2
AWS_REGION: us-west-2
AWS_ROLE_ARN: arn:aws:iam::0000000000000:role/company-clickhouse-role-prod
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/docker-entrypoint-initdb.d from bootstrap-configmap-volume (rw)
/etc/clickhouse-server/conf.d/ from chi-clickhouse-installation-deploy-confd-simple-1-1 (rw)
/etc/clickhouse-server/config.d/ from chi-clickhouse-installation-common-configd (rw)
/etc/clickhouse-server/users.d/ from chi-clickhouse-installation-common-usersd (rw)
/var/lib/clickhouse from data-volume-template (rw)
/var/log/clickhouse-server from log-volume-template (rw)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8sp2v (ro)
clickhouse-log:
Container ID: containerd://dceeebea25731fe1a0bbba3b0ca2789a626bdaeba8e6ebf358fa6cb429cb9f51
Image: registry.access.redhat.com/ubi8/ubi-minimal:latest
Image ID: registry.access.redhat.com/ubi8/ubi-minimal@sha256:a47c89f02b39a98290f88204ed3d162845db0a0c464b319c2596cfd1e94b444e
Port:
Host Port:
Command:
/bin/sh
-c
--
Args:
while true; do sleep 30; done;
State: Running
Started: Thu, 12 Sep 2024 16:38:35 -0500
Ready: True
Restart Count: 0
Environment:
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_DEFAULT_REGION: us-west-2
AWS_REGION: us-west-2
AWS_ROLE_ARN: arn:aws:iam::000000000000:role/company-clickhouse-role-prod
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/var/lib/clickhouse from data-volume-template (rw)
/var/log/clickhouse-server from log-volume-template (rw)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8sp2v (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
data-volume-template:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-volume-template-chi-clickhouse-installation-simple-1-1-0
ReadOnly: false
log-volume-template:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: log-volume-template-chi-clickhouse-installation-simple-1-1-0
ReadOnly: false
bootstrap-configmap-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: bootstrap-configmap
Optional: false
chi-clickhouse-installation-common-configd:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: chi-clickhouse-installation-common-configd
Optional: false
chi-clickhouse-installation-common-usersd:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: chi-clickhouse-installation-common-usersd
Optional: false
chi-clickhouse-installation-deploy-confd-simple-1-1:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: chi-clickhouse-installation-deploy-confd-simple-1-1
Optional: false
kube-api-access-8sp2v:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: clickhouse-installation=true
Tolerations: installation=clickhouse-installation:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 7m26s (x2104 over 4h1m) kubelet Readiness probe failed: Get "http://10.132.130.00:8123/ping": dial tcp 10.132.130.00:8123: connect: connection refused
Warning BackOff 2m20s (x574 over 3h48m) kubelet Back-off restarting failed container clickhouse in pod chi-clickhouse-installation-simple-1-1-0_clickhouse(510d3812-9037-4452-adc9-417531893542)
```