Closed Rajpratik71 closed 3 weeks ago
Let's check, is /var/lib/clickhouse present in pod mounts ?
Could you share?
oc -n clickhouse getpod chi-clickhouse-olap-clickhouse-olap-0-0-0 -o yaml
It is there
pratikraj@Pratiks-MacBook-Pro common % oc get pod -n clickhouse chi-clickhouse-olap-clickhouse-olap-0-0-0 -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.254.22.217/22"],"mac_address":"0a:58:0a:fe:16:d9","gateway_ips":["10.254.20.1"],"routes":[{"dest":"10.254.0.0/16","nextHop":"10.254.20.1"},{"dest":"172.30.0.0/16","nextHop":"10.254.20.1"},{"dest":"100.64.0.0/16","nextHop":"10.254.20.1"}],"ip_address":"10.254.22.217/22","gateway_ip":"10.254.20.1"}}'
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.254.22.217"
],
"mac": "0a:58:0a:fe:16:d9",
"default": true,
"dns": {}
}]
openshift.io/scc: restricted-v2
seccomp.security.alpha.kubernetes.io/pod: runtime/default
creationTimestamp: "2024-07-24T19:42:43Z"
generateName: chi-clickhouse-olap-clickhouse-olap-0-0-
labels:
clickhouse.altinity.com/app: chop
clickhouse.altinity.com/chi: clickhouse-olap
clickhouse.altinity.com/cluster: clickhouse-olap
clickhouse.altinity.com/namespace: clickhouse
clickhouse.altinity.com/ready: "yes"
clickhouse.altinity.com/replica: "0"
clickhouse.altinity.com/shard: "0"
controller-revision-hash: chi-clickhouse-olap-clickhouse-olap-0-0-6d59bbb7c
statefulset.kubernetes.io/pod-name: chi-clickhouse-olap-clickhouse-olap-0-0-0
name: chi-clickhouse-olap-clickhouse-olap-0-0-0
namespace: clickhouse
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: chi-clickhouse-olap-clickhouse-olap-0-0
uid: 3410fc06-876c-4748-814c-c4b511685b60
resourceVersion: "1682310"
uid: 25afed6c-7481-4284-99a3-97caf8516279
spec:
containers:
- image: clickhouse/clickhouse-server:latest
imagePullPolicy: Always
livenessProbe:
failureThreshold: 10
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
name: clickhouse
ports:
- containerPort: 9000
name: tcp
protocol: TCP
- containerPort: 8123
name: http
protocol: TCP
- containerPort: 9009
name: interserver
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000730000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/clickhouse-server/config.d/
name: chi-clickhouse-olap-common-configd
- mountPath: /etc/clickhouse-server/users.d/
name: chi-clickhouse-olap-common-usersd
- mountPath: /etc/clickhouse-server/conf.d/
name: chi-clickhouse-olap-deploy-confd-clickhouse-olap-0-0
- mountPath: /var/lib/clickhouse
name: data-volume-template
- mountPath: /var/log/clickhouse-server
name: log-volume-template
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-r2djq
readOnly: true
- args:
- while true; do sleep 30; done;
command:
- /bin/sh
- -c
- --
image: registry.access.redhat.com/ubi8/ubi-minimal:latest
imagePullPolicy: Always
name: clickhouse-log
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000730000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/clickhouse
name: data-volume-template
- mountPath: /var/log/clickhouse-server
name: log-volume-template
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-r2djq
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostAliases:
- hostnames:
- chi-clickhouse-olap-clickhouse-olap-0-0
ip: 127.0.0.1
hostname: chi-clickhouse-olap-clickhouse-olap-0-0-0
imagePullSecrets:
- name: default-dockercfg-v9z72
nodeName: worker1.gi-tracing-poc.cp.fyre.ibm.com
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000730000
seLinuxOptions:
level: s0:c27,c14
seccompProfile:
type: RuntimeDefault
serviceAccount: default
serviceAccountName: default
subdomain: chi-clickhouse-olap-clickhouse-olap-0-0
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: data-volume-template
persistentVolumeClaim:
claimName: data-volume-template-chi-clickhouse-olap-clickhouse-olap-0-0-0
- name: log-volume-template
persistentVolumeClaim:
claimName: log-volume-template-chi-clickhouse-olap-clickhouse-olap-0-0-0
- configMap:
defaultMode: 420
name: chi-clickhouse-olap-common-configd
name: chi-clickhouse-olap-common-configd
- configMap:
defaultMode: 420
name: chi-clickhouse-olap-common-usersd
name: chi-clickhouse-olap-common-usersd
- configMap:
defaultMode: 420
name: chi-clickhouse-olap-deploy-confd-clickhouse-olap-0-0
name: chi-clickhouse-olap-deploy-confd-clickhouse-olap-0-0
- name: kube-api-access-r2djq
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
- configMap:
items:
- key: service-ca.crt
path: service-ca.crt
name: openshift-service-ca.crt
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2024-07-24T19:42:44Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2024-07-24T19:42:44Z"
message: 'containers with unready status: [clickhouse]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2024-07-24T19:42:44Z"
message: 'containers with unready status: [clickhouse]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2024-07-24T19:42:44Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: cri-o://3a9e1ba9e1cb22a9f9a3b962500d244fe0717c8471848c177f60ac059fec089c
image: docker.io/clickhouse/clickhouse-server:latest
imageID: docker.io/clickhouse/clickhouse-server@sha256:00d808c094fa0e790b662f4ee5b7a7476c990c79907c997ac2a1484a8833ab70
lastState:
terminated:
containerID: cri-o://3a9e1ba9e1cb22a9f9a3b962500d244fe0717c8471848c177f60ac059fec089c
exitCode: 1
finishedAt: "2024-07-24T20:19:41Z"
reason: Error
startedAt: "2024-07-24T20:19:40Z"
name: clickhouse
ready: false
restartCount: 12
started: false
state:
waiting:
message: back-off 5m0s restarting failed container=clickhouse pod=chi-clickhouse-olap-clickhouse-olap-0-0-0_clickhouse(25afed6c-7481-4284-99a3-97caf8516279)
reason: CrashLoopBackOff
- containerID: cri-o://b2c34c6fcabcbe6aa3b97512ea692bb3453ae012b09404a9d2c29456b9885760
image: registry.access.redhat.com/ubi8/ubi-minimal:latest
imageID: registry.access.redhat.com/ubi8/ubi-minimal@sha256:a6e546ff72e0eca114e0bfee08aa5b1bba726fc3986a8fa1e453629e054c4357
lastState: {}
name: clickhouse-log
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2024-07-24T19:42:49Z"
hostIP: 10.17.60.70
phase: Running
podIP: 10.254.22.217
podIPs:
- ip: 10.254.22.217
qosClass: BestEffort
startTime: "2024-07-24T19:42:44Z"
Which component responsible for second container with sleep 30 and securityContext?
I don't see where exactly you configure it, looks like these customizations added outside from clickhouse-operator
?
I think the root reason is
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000730000
to strict security context
could we change it to following according to internal docker image uid?
securityContext:
runAsUser: 101
runAsGroup: 101
fsGroup: 101
allowPrivilegeEscalation: false
capabilities:
drop: [ "ALL" ]
add: [ "CAP_NICE", "CAP_IPC_LOCK" ]
If we can't change securityContext
could we add
env:
- name: CLIKCHOUSE_UID
value: 1000730000
to podTemplate
?
Something like that should work
spec:
defaults:
templates:
podTemlate: custom-uid
templates:
podTemplates:
- name: custom-uid
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:latest
env:
- name: CLIKCHOUSE_UID
value: 1000730000
- name: CLIKCHOUSE_GID
value: 1000730000
Tried below snippet but adding below snippet resulted in pending instance creation
spec:
defaults:
templates:
podTemlate: custom-uid
templates:
podTemplates:
- name: custom-uid
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:latest
env:
- name: CLIKCHOUSE_UID
value: 1000730000
- name: CLIKCHOUSE_GID
value: 1000730000
even after 3 mins it remains in pending state
while without the above snippet , it got created in few sec.
which clickhouse-opeator
version do you use
could you share?
oc get pods --all-namespaces -l app=clickhouse-operator -o jsonpath="{.items[*].spec.containers[*].image}"
altinity/clickhouse-operator:0.23.6 altinity/metrics-exporter:0.23.6
apply changes from https://github.com/Altinity/clickhouse-operator/issues/1464#issuecomment-2250043030 again
and when status will InProgress
share
oc describe chi -n clickhouse clickhouse-olap
Events
Section
No success.
Even after 65m , no update .
pratikraj@Pratiks-MacBook-Pro common % oc -n clickhouse get po,svc,pvc,ClickHouseInstallation
NAME CLUSTERS HOSTS STATUS HOSTS-COMPLETED AGE
clickhouseinstallation.clickhouse.altinity.com/clickhouse-olap 65m
pratikraj@Pratiks-MacBook-Pro common %
pratikraj@Pratiks-MacBook-Pro common % oc describe chi -n clickhouse clickhouse-olap
Name: clickhouse-olap
Namespace: clickhouse
Labels: <none>
Annotations: <none>
API Version: clickhouse.altinity.com/v1
Kind: ClickHouseInstallation
Metadata:
Creation Timestamp: 2024-07-25T12:47:20Z
Generation: 1
Resource Version: 3999789
UID: 20e63d44-dc3d-4b29-a850-6b688637655f
Spec:
Configuration:
Clusters:
Layout:
Replicas Count: 1
Shards Count: 1
Name: clickhouse-olap
Defaults:
Templates:
Data Volume Claim Template: data-volume-template
Log Volume Claim Template: log-volume-template
Templates:
Pod Templates:
Name: custom-uid
Spec:
Containers:
Env:
Name: CLIKCHOUSE_UID
Value: 1000730000
Name: CLIKCHOUSE_GID
Value: 1000730000
Image: clickhouse/clickhouse-server:latest
Name: clickhouse
Volume Claim Templates:
Name: data-volume-template
Spec:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 100Gi
Name: log-volume-template
Spec:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 100Gi
Events: <none>
pratikraj@Pratiks-MacBook-Pro common %
pratikraj@Pratiks-MacBook-Pro common %
One thing i notice that "podTemlate: custom-uid" is not missing in respective section.
check clickhouse-operator logs
oc logs --all-namespaces -l app=clickhouse-operator --container clickhouse-operator --since=2h
if not work
check
oc get pods --all-namespaces -l app=clickhouse-operator
and
oc logs -n <your-namespace-where-operator-installed> deployment/clickhouse-operator --container clickhouse-operator --since=2h
also check
oc get sts -n clickhouse
In operator log , got the issue
W0726 03:29:44.109340 1 reflector.go:533] pkg/client/informers/externalversions/factory.go:132: failed to list *v1.ClickHouseInstallation: json: cannot unmarshal number into Go struct field EnvVar.items.spec.templates.podTemplates.spec.containers.env.value of type string
E0726 03:29:44.109373 1 reflector.go:148] pkg/client/informers/externalversions/factory.go:132: Failed to watch *v1.ClickHouseInstallation: failed to list *v1.ClickHouseInstallation: json: cannot unmarshal number into Go struct field EnvVar.items.spec.templates.podTemplates.spec.containers.env.value of type string
which is fixed by adding quotes in env values.
But again same issue while starting the pod
pratikraj@Pratiks-MacBook-Pro common %
pratikraj@Pratiks-MacBook-Pro common % oc -n clickhouse get po,svc,pvc,ClickHouseInstallation
NAME READY STATUS RESTARTS AGE
pod/chi-clickhouse-olap-clickhouse-olap-0-0-0 1/2 CrashLoopBackOff 18 (2m46s ago) 70m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/chi-clickhouse-olap-clickhouse-olap-0-0 ClusterIP None <none> 9000/TCP,8123/TCP,9009/TCP 65m
service/clickhouse-clickhouse-olap ClusterIP None <none> 8123/TCP,9000/TCP 55m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/data-volume-template-chi-clickhouse-olap-clickhouse-olap-0-0-0 Bound pvc-6f901528-fc9d-468f-a1c6-5d0d1ef687a4 100Gi RWO rook-ceph-block 70m
persistentvolumeclaim/log-volume-template-chi-clickhouse-olap-clickhouse-olap-0-0-0 Bound pvc-6952669c-93ff-4e31-8e74-c6819fa3668f 100Gi RWO rook-ceph-block 70m
NAME CLUSTERS HOSTS STATUS HOSTS-COMPLETED AGE
clickhouseinstallation.clickhouse.altinity.com/clickhouse-olap 1 1 Completed 70m
pratikraj@Pratiks-MacBook-Pro common %
pratikraj@Pratiks-MacBook-Pro common % oc logs -n clickhouse chi-clickhouse-olap-clickhouse-olap-0-0-0
Defaulted container "clickhouse" out of: clickhouse, clickhouse-log
Code: 36. DB::Exception: Group 0 is not found in the system. (BAD_ARGUMENTS) (version 24.6.2.17 (official build))
Couldn't create necessary directory: /var/lib/clickhouse/
pratikraj@Pratiks-MacBook-Pro common %
pratikraj@Pratiks-MacBook-Pro common %
need to figure out, which component added security context in your OpenShift?
oc exec -n clickhouse chi-clickhouse-olap-clickhouse-olap-0-0-0 --container clickhouse-log -- ls -la /var/lib/clickhouse
oc exec -n clickhouse chi-clickhouse-olap-clickhouse-olap-0-0-0 --container clickhouse-log -- whoami
need to figure out, which component added security context in your OpenShift?
I think default "PodSecurityPolicy" is enabled. I don't have any other security / policy enforcer tool installed.
oc exec -n clickhouse chi-clickhouse-olap-clickhouse-olap-0-0-0 --container clickhouse-log -- ls -la /var/lib/clickhouse oc exec -n clickhouse chi-clickhouse-olap-clickhouse-olap-0-0-0 --container clickhouse-log -- whoami
pratikraj@Pratiks-MacBook-Pro ~ %
pratikraj@Pratiks-MacBook-Pro ~ % oc exec -n clickhouse chi-clickhouse-olap-clickhouse-olap-0-0-0 --container clickhouse-log -- ls -la /var/lib/clickhouse
total 20
drwxrwsrwx. 3 root 1000730000 4096 Jul 26 03:31 .
drwxr-xr-x. 1 root root 24 Jul 26 03:31 ..
drwxrws---. 2 root 1000730000 16384 Jul 26 03:31 lost+found
pratikraj@Pratiks-MacBook-Pro ~ %
pratikraj@Pratiks-MacBook-Pro ~ %
pratikraj@Pratiks-MacBook-Pro ~ % oc exec -n clickhouse chi-clickhouse-olap-clickhouse-olap-0-0-0 --container clickhouse-log -- whoami
1000730000
pratikraj@Pratiks-MacBook-Pro ~ %
pratikraj@Pratiks-MacBook-Pro ~ %
let's apply CLICKHOUSE_DO_NOT_CHOWN=1
spec:
defaults:
templates:
podTemlate: custom-uid
templates:
podTemplates:
- name: custom-uid
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:latest
env:
- name: CLIKCHOUSE_UID
value: "1000730000"
- name: CLIKCHOUSE_GID
value: "1000730000"
- name: CLICKHOUSE_DO_NOT_CHOWN
value: "1"
same issue
@Rajpratik71 , is it resolved? What is a reason of CrashLoopBackOff? (you should be able to see it in container logs)
Do you need clickhouse-logs container, btw? It is rarely useful
@Rajpratik71 , is it resolved? What is a reason of CrashLoopBackOff? (you should be able to see it in container logs)
Do you need clickhouse-logs container, btw? It is rarely useful
@alex-zaitsev same issue.
getting below in log
pratikraj@Pratiks-MacBook-Pro common % oc logs -n clickhouse chi-clickhouse-olap-clickhouse-olap-0-0-0
Defaulted container "clickhouse" out of: clickhouse, clickhouse-log
Code: 36. DB::Exception: Group 0 is not found in the system. (BAD_ARGUMENTS) (version 24.6.2.17 (official build))
Couldn't create necessary directory: /var/lib/clickhouse/
pratikraj@Pratiks-MacBook-Pro common %
@Rajpratik71 , have you tried adding security context as suggested above?
securityContext:
runAsUser: 101
runAsGroup: 101
fsGroup: 101
allowPrivilegeEscalation: false
Could you post here full CHI spec deleting sensitive data?
Have ran into the same issue when deploying on OpenShift. I have tried the above (fixed the spelling mistakes) without success. It might be related to this issue ClickHouse/ClickHouse#59141 as well.
@Rajpratik71 , have you tried adding security context as suggested above?
securityContext: runAsUser: 101 runAsGroup: 101 fsGroup: 101 allowPrivilegeEscalation: false
Could you post here full CHI spec deleting sensitive data?
This works as we are running the intended UID and GID for the docker container. Unfortunately this means that we will need a custom scc (or anyuid) to run the pod. Hopefully the above-mentioned issue gets fixed so that we can run the docker container as non-root.
For those on OpenShift, you can try this workaround for every namespace
Create the service account
kubectl create sa clickhouse -n test-clickhouse
Give the service account anyuid permissions (or a custom scc that is more restrictive that can run as 101)
oc adm policy add-scc-to-user anyuid -z clickhouse -n test-clickhouse
Run the pod with the service account
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: clickhouse
namespace: test-clickhouse
spec:
defaults:
templates:
podTemplate: custom-uid
configuration:
clusters:
- name: clickhouse
layout:
shardsCount: 1
replicasCount: 1
templates:
podTemplates:
- name: custom-uid
spec:
securityContext:
runAsUser: 101
runAsGroup: 101
fsGroup: 101
allowPrivilegeEscalation: false
serviceAccountName: clickhouse
automountServiceAccountToken: false
@keyute thanks for openshift workaround, let's close issue
Operator install is success.
When tried to deploy "ClickHouse" Instance using below yaml
getting below error while starting the pod