Open ragsarang opened 2 years ago
Describe the unexpected behaviour When I execute a statement with ON CLUSTER, It should be able to run it on all the shards and replicas. However it is timing out.
How to reproduce
- Which ClickHouse server version to use: 21.12.3.32
- Which interface to use, if matters: Clickhouse-operator
CREATE TABLE
statements for all tables involved:CREATE TABLE events_local on cluster '{cluster}' ( event_date Date, event_type Int32, article_id Int32, title String ) engine=ReplicatedMergeTree('/clickhouse/{installation}/{cluster}/tables/{shard}/{database}/{table}', '{replica}') PARTITION BY toYYYYMM(event_date) ORDER BY (event_type, article_id);
- Queries to run that lead to unexpected result Above one and also create database statement:
CREATE DATABASE test ON CLUSTER '{cluster}' ;
Expected behavior The statement should be executed and database/ table should be created successfully
Error message and/or stacktrace
Query id: 98c72107-9eab-40be-b56e-11dcefbf4e59 0 rows in set. Elapsed: 180.672 sec. Received exception from server (version 21.12.3): Code: 159. DB::Exception: Received from localhost:9000. DB::Exception: Watching task /clickhouse/repl-1s1r/task_queue/ddl/query-0000000003 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished hosts (0 of them are currently active), they are going to execute the query in background. (TIMEOUT_EXCEEDED)
Additional context ClickHouse/ClickHouse#33019
I updated clickhouse installation spec with replicasUseFQDN: "yes"
.
Now the hostname is rendered like below. But the address is not rendered properly and unable to ping the rendered address in hostname section. How can we modify the hostname parameter to render the correct address as per the /etc/hosts ?
SELECT *
FROM system.clusters
Query id: 28ce0733-8f8c-4782-bb66-02495c32d732
┌─cluster──────────────────────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name───────────────────────────────────────────┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─slowdowns_count─┬─estimated_recovery_time─┐
│ all-replicated │ 1 │ 1 │ 1 │ chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local │ │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ all-sharded │ 1 │ 1 │ 1 │ chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local │ │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ replcluster │ 1 │ 1 │ 1 │ chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local │ │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ test_cluster_two_shards │ 1 │ 1 │ 1 │ 127.0.0.1 │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ test_cluster_two_shards │ 2 │ 1 │ 1 │ 127.0.0.2 │ 127.0.0.2 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ test_cluster_two_shards_internal_replication │ 1 │ 1 │ 1 │ 127.0.0.1 │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ test_cluster_two_shards_internal_replication │ 2 │ 1 │ 1 │ 127.0.0.2 │ 127.0.0.2 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ test_cluster_two_shards_localhost │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ test_cluster_two_shards_localhost │ 2 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ test_shard_localhost │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ test_shard_localhost_secure │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9440 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ test_unavailable_shard │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ test_unavailable_shard │ 2 │ 1 │ 1 │ localhost │ ::1 │ 1 │ 0 │ default │ │ 0 │ 0 │ 0 │
└──────────────────────────────────────────────┴───────────┴──────────────┴─────────────┴─────────────────────────────────────────────────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────┴─────────────────────────┘
13 rows in set. Elapsed: 0.021 sec.
could you share
kubectl get svc -n ch1 -o wide
and
kubectl get chi -n ch1 repl-1s1r -o yaml
Please find the details below:
kubectl get svc -n ch1 -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR chi-repl-1s1r-replcluster-0-0 ClusterIP None <none> 8123/TCP,9000/TCP,9009/TCP 25h clickhouse.altinity.com/app=chop,clickhouse.altinity.com/chi=repl-1s1r,clickhouse.altinity.com/cluster=replcluster,clickhouse.altinity.com/namespace=ch1,clickhouse.altinity.com/replica=0,clickhouse.altinity.com/shard=0 clickhouse-operator-metrics ClusterIP xxxx:xxxx:xxx:xxxx:xxxx:xx:0:a8c0 <none> 8888/TCP 15d app=clickhouse-operator clickhouse-repl-1s1r LoadBalancer xxxx:xxxx:xxx:xxxx:xxxx:xx:0:e2ed <pending> 8123:32187/TCP,9000:30449/TCP 28h clickhouse.altinity.com/app=chop,clickhouse.altinity.com/chi=repl-1s1r,clickhouse.altinity.com/namespace=ch1,clickhouse.altinity.com/ready=yes
kubectl get chi -n ch1 repl-1s1r -o yaml
apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"clickhouse.altinity.com/v1","kind":"ClickHouseInstallation","metadata":{"annotations":{},"name":"repl-1s1r","namespace":"ch1"},"spec":{"configuration":{"clusters":[{"layout":{"replicasCount":1,"shardsCount":1},"name":"replcluster","templates":{"podTemplate":"clickhouse-with-volume-template"}}],"zookeeper":{"nodes":[{"host":"zookeeper-0.zookeepers.ch2.svc.uhxxxxxx7.local","port":2181},{"host":"zookeeper-1.zookeepers.ch2.svc.uhxxxxxx7.local","port":2181},{"host":"zookeeper-2.zookeepers.ch2.svc.uhxxxxxx7.local","port":2181}]}},"defaults":{"distributedDDL":{"profile":"default"},"replicasUseFQDN":"yes"},"templates":{"podTemplates":[{"name":"clickhouse-with-volume-template","spec":{"containers":[{"image":"private-docker-registry/clickhouse-server:21.12.3.32.ipv6","name":"clickhouse-pod","resources":{"requests":{"cpu":4,"memory":"32G"}},"volumeMounts":[{"mountPath":"/var/lib/clickhouse","name":"clickhouse-storage-template"}]}],"nodeSelector":{"robin.io/rnodetype":"robin-worker-node"},"tolerations":[{"effect":"NoSchedule","key":"k8s.sssssss.com/worker","operator":"Exists"}]}}],"volumeClaimTemplates":[{"name":"clickhouse-storage-template","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"50Gi"}}}}]}}}
creationTimestamp: "2022-01-04T04:38:10Z"
finalizers:
- finalizer.clickhouseinstallation.altinity.com
generation: 11
managedFields:
- apiVersion: clickhouse.altinity.com/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:spec:
.: {}
f:configuration:
.: {}
f:clusters: {}
f:zookeeper:
.: {}
f:nodes: {}
f:defaults:
.: {}
f:distributedDDL:
.: {}
f:profile: {}
f:replicasUseFQDN: {}
f:templates: {}
manager: kubectl
operation: Update
time: "2022-01-04T18:48:24Z"
- apiVersion: clickhouse.altinity.com/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.: {}
v:"finalizer.clickhouseinstallation.altinity.com": {}
f:spec:
f:templates:
f:podTemplates: {}
f:volumeClaimTemplates: {}
f:status:
.: {}
f:actions: {}
f:added: {}
f:clusters: {}
f:endpoint: {}
f:error: {}
f:errors: {}
f:fqdns: {}
f:generation: {}
f:hosts: {}
f:normalized:
.: {}
f:apiVersion: {}
f:kind: {}
f:metadata: {}
f:spec: {}
f:status: {}
f:pods: {}
f:replicas: {}
f:shards: {}
f:status: {}
f:taskID: {}
f:taskIDsCompleted: {}
f:taskIDsStarted: {}
f:version: {}
manager: clickhouse-operator
operation: Update
time: "2022-01-04T18:50:24Z"
name: repl-1s1r
namespace: ch1
resourceVersion: "334779133"
uid: 5432de60-d78b-4bc3-ac71-909ef8ae899b
spec:
configuration:
clusters:
- layout:
replicasCount: 1
shardsCount: 1
name: replcluster
templates:
podTemplate: clickhouse-with-volume-template
zookeeper:
nodes:
- host: zookeeper-0.zookeepers.ch2.svc.uhxxxxxx7.local
port: 2181
- host: zookeeper-1.zookeepers.ch2.svc.uhxxxxxx7.local
port: 2181
- host: zookeeper-2.zookeepers.ch2.svc.uhxxxxxx7.local
port: 2181
defaults:
distributedDDL:
profile: default
replicasUseFQDN: "yes"
templates:
podTemplates:
- name: clickhouse-with-volume-template
spec:
containers:
- image: private-docker-registry/clickhouse-server:21.12.3.32.ipv6
name: clickhouse-pod
resources:
requests:
cpu: 4
memory: 32G
volumeMounts:
- mountPath: /var/lib/clickhouse
name: clickhouse-storage-template
nodeSelector:
robin.io/rnodetype: robin-worker-node
tolerations:
- effect: NoSchedule
key: k8s.sssssss.com/worker
operator: Exists
volumeClaimTemplates:
- name: clickhouse-storage-template
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
status:
actions:
- reconcile completed
- add CHI to monitoring
- remove items scheduled for deletion
- remove items scheduled for deletion
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Reconcile Host 0-0 completed
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Adding tables on shard/host:0/0 cluster:replcluster
- Update Service ch1/chi-repl-1s1r-replcluster-0-0
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - completed
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - started
- Update StatefulSet(ch1/chi-repl-1s1r-replcluster-0-0) - started
- Update ConfigMap ch1/chi-repl-1s1r-deploy-confd-replcluster-0-0
- Reconcile Host 0-0 started
- Update ConfigMap ch1/chi-repl-1s1r-common-usersd
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Update Service ch1/clickhouse-repl-1s1r
- reconcile started
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - error ignored
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - started
- Update StatefulSet(ch1/chi-repl-1s1r-replcluster-0-0) - started
- Update ConfigMap ch1/chi-repl-1s1r-deploy-confd-replcluster-0-0
- Reconcile Host 0-0 started
- Update ConfigMap ch1/chi-repl-1s1r-common-usersd
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Update Service ch1/clickhouse-repl-1s1r
- reconcile started
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - started
- Update StatefulSet(ch1/chi-repl-1s1r-replcluster-0-0) - started
- Update ConfigMap ch1/chi-repl-1s1r-deploy-confd-replcluster-0-0
- Reconcile Host 0-0 started
- Update ConfigMap ch1/chi-repl-1s1r-common-usersd
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Update Service ch1/clickhouse-repl-1s1r
- reconcile started
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - error ignored
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - started
- Update ConfigMap ch1/chi-repl-1s1r-deploy-confd-replcluster-0-0
- Reconcile Host 0-0 started
- Update ConfigMap ch1/chi-repl-1s1r-common-usersd
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Update Service ch1/clickhouse-repl-1s1r
- reconcile started
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- 'Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - failed with error StatefulSet.apps
"chi-repl-1s1r-replcluster-0-0" is invalid: spec.template.spec.restartPolicy:
Unsupported value: "Never": supported values: "Always"'
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - started
- Update ConfigMap ch1/chi-repl-1s1r-deploy-confd-replcluster-0-0
- Reconcile Host 0-0 started
- Update ConfigMap ch1/chi-repl-1s1r-common-usersd
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Update Service ch1/clickhouse-repl-1s1r
- reconcile started
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- 'Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - failed with error StatefulSet.apps
"chi-repl-1s1r-replcluster-0-0" is invalid: spec.template.spec.restartPolicy:
Unsupported value: "OnFailure": supported values: "Always"'
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - started
- Update ConfigMap ch1/chi-repl-1s1r-deploy-confd-replcluster-0-0
- Reconcile Host 0-0 started
- Update ConfigMap ch1/chi-repl-1s1r-common-usersd
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Update Service ch1/clickhouse-repl-1s1r
- reconcile started
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- 'Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - failed with error StatefulSet.apps
"chi-repl-1s1r-replcluster-0-0" is invalid: spec.template.spec.restartPolicy:
Unsupported value: "OnFailure": supported values: "Always"'
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - started
- Update StatefulSet(ch1/chi-repl-1s1r-replcluster-0-0) - started
- Update ConfigMap ch1/chi-repl-1s1r-deploy-confd-replcluster-0-0
- Reconcile Host 0-0 started
- Update ConfigMap ch1/chi-repl-1s1r-common-usersd
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Update Service ch1/clickhouse-repl-1s1r
- reconcile started
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - error ignored
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - started
- |-
Update StatefulSet(ch1/chi-repl-1s1r-replcluster-0-0) - failed with error
---
onStatefulSetCreateFailed - stop
--
Continue with recreate
- Update StatefulSet(ch1/chi-repl-1s1r-replcluster-0-0) - started
- Update ConfigMap ch1/chi-repl-1s1r-deploy-confd-replcluster-0-0
- Reconcile Host 0-0 started
- Update ConfigMap ch1/chi-repl-1s1r-common-usersd
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Update Service ch1/clickhouse-repl-1s1r
- reconcile started
- reconcile completed
- add CHI to monitoring
- remove items scheduled for deletion
- remove items scheduled for deletion
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Reconcile Host 0-0 completed
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Create Service ch1/chi-repl-1s1r-replcluster-0-0
- Update ConfigMap ch1/chi-repl-1s1r-deploy-confd-replcluster-0-0
- Reconcile Host 0-0 started
- Update ConfigMap ch1/chi-repl-1s1r-common-usersd
- Update ConfigMap ch1/chi-repl-1s1r-common-configd
- Update Service ch1/clickhouse-repl-1s1r
- reconcile started
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - error ignored
added: 1
clusters: 1
endpoint: clickhouse-repl-1s1r.ch1.svc.cluster.local
error: 'FAILED update: onStatefulSetCreateFailed - ignore'
errors:
- 'FAILED update: onStatefulSetCreateFailed - ignore'
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- 'FAILED update: onStatefulSetCreateFailed - ignore'
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- 'FAILED update: StatefulSet.apps "chi-repl-1s1r-replcluster-0-0" is invalid: spec.template.spec.restartPolicy:
Unsupported value: "Never": supported values: "Always"'
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- 'Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - failed with error StatefulSet.apps
"chi-repl-1s1r-replcluster-0-0" is invalid: spec.template.spec.restartPolicy:
Unsupported value: "Never": supported values: "Always"'
- 'FAILED update: StatefulSet.apps "chi-repl-1s1r-replcluster-0-0" is invalid: spec.template.spec.restartPolicy:
Unsupported value: "OnFailure": supported values: "Always"'
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- 'Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - failed with error StatefulSet.apps
"chi-repl-1s1r-replcluster-0-0" is invalid: spec.template.spec.restartPolicy:
Unsupported value: "OnFailure": supported values: "Always"'
- 'FAILED update: StatefulSet.apps "chi-repl-1s1r-replcluster-0-0" is invalid: spec.template.spec.restartPolicy:
Unsupported value: "OnFailure": supported values: "Always"'
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- 'Create StatefulSet ch1/chi-repl-1s1r-replcluster-0-0 - failed with error StatefulSet.apps
"chi-repl-1s1r-replcluster-0-0" is invalid: spec.template.spec.restartPolicy:
Unsupported value: "OnFailure": supported values: "Always"'
- 'FAILED update: onStatefulSetCreateFailed - ignore'
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- |-
Update StatefulSet(ch1/chi-repl-1s1r-replcluster-0-0) - failed with error
---
onStatefulSetCreateFailed - stop
--
Continue with recreate
- 'FAILED to drop replica on host 1-0 with error FAILED connect(http://***:***@chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local:8123/)
for SQL: SYSTEM DROP REPLICA ''chi-repl-1s1r-replcluster-1-0.ch1.svc.cluster.local'''
- 'FAILED update: onStatefulSetCreateFailed - ignore'
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
- 'FAILED update: onStatefulSetCreateFailed - ignore'
- 'FAILED to reconcile StatefulSet: chi-repl-1s1r-replcluster-0-0 CHI: repl-1s1r '
fqdns:
- chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local
generation: 11
hosts: 1
normalized:
apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
creationTimestamp: "2022-01-04T04:38:10Z"
finalizers:
- finalizer.clickhouseinstallation.altinity.com
generation: 11
managedFields:
- apiVersion: clickhouse.altinity.com/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.: {}
v:"finalizer.clickhouseinstallation.altinity.com": {}
f:spec:
f:templates:
f:volumeClaimTemplates: {}
f:status:
.: {}
f:action: {}
f:actions: {}
f:added: {}
f:clusters: {}
f:endpoint: {}
f:error: {}
f:errors: {}
f:fqdns: {}
f:generation: {}
f:hosts: {}
f:normalized:
.: {}
f:apiVersion: {}
f:kind: {}
f:metadata: {}
f:spec: {}
f:status: {}
f:pods: {}
f:replicas: {}
f:shards: {}
f:status: {}
f:taskID: {}
f:taskIDsCompleted: {}
f:taskIDsStarted: {}
f:version: {}
manager: clickhouse-operator
operation: Update
time: "2022-01-04T16:39:09Z"
- apiVersion: clickhouse.altinity.com/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:spec:
.: {}
f:configuration:
.: {}
f:clusters: {}
f:zookeeper:
.: {}
f:nodes: {}
f:defaults:
.: {}
f:distributedDDL:
.: {}
f:profile: {}
f:replicasUseFQDN: {}
f:templates:
.: {}
f:podTemplates: {}
f:volumeClaimTemplates: {}
manager: kubectl
operation: Update
time: "2022-01-04T18:48:24Z"
name: repl-1s1r
namespace: ch1
resourceVersion: "334777111"
uid: 5432de60-d78b-4bc3-ac71-909ef8ae899b
spec:
configuration:
clusters:
- layout:
replicas:
- name: "0"
shards:
- httpPort: 8123
interserverHTTPPort: 9009
name: 0-0
tcpPort: 9000
templates:
podTemplate: clickhouse-with-volume-template
shardsCount: 1
templates:
podTemplate: clickhouse-with-volume-template
replicasCount: 1
shards:
- internalReplication: "false"
name: "0"
replicas:
- httpPort: 8123
interserverHTTPPort: 9009
name: 0-0
tcpPort: 9000
templates:
podTemplate: clickhouse-with-volume-template
replicasCount: 1
templates:
podTemplate: clickhouse-with-volume-template
shardsCount: 1
name: replcluster
templates:
podTemplate: clickhouse-with-volume-template
zookeeper:
nodes:
- host: zookeeper-0.zookeepers.ch2.svc.uhxxxxxx7.local
port: 2181
- host: zookeeper-1.zookeepers.ch2.svc.uhxxxxxx7.local
port: 2181
- host: zookeeper-2.zookeepers.ch2.svc.uhxxxxxx7.local
port: 2181
users:
default/networks/host_regexp: (chi-repl-1s1r-[^.]+\d+-\d+|clickhouse\-repl-1s1r)\.ch1\.svc\.cluster\.local$
default/networks/ip:
- ::1
- 127.0.0.1
default/profile: default
default/quota: default
zookeeper:
nodes:
- host: zookeeper-0.zookeepers.ch2.svc.uhxxxxxx7.local
port: 2181
- host: zookeeper-1.zookeepers.ch2.svc.uhxxxxxx7.local
port: 2181
- host: zookeeper-2.zookeepers.ch2.svc.uhxxxxxx7.local
port: 2181
defaults:
distributedDDL:
profile: default
replicasUseFQDN: "true"
reconciling:
cleanup:
reconcileFailedObjects:
configMap: Retain
pvc: Retain
service: Retain
statefulSet: Retain
unknownObjects:
configMap: Delete
pvc: Delete
service: Delete
statefulSet: Delete
configMapPropagationTimeout: 60
policy: unspecified
stop: "false"
taskID: 9f4049b4-dab5-4168-9198-3b8612a7fc79
templates:
PodTemplatesIndex: {}
VolumeClaimTemplatesIndex: {}
podTemplates:
- metadata:
creationTimestamp: null
name: clickhouse-with-volume-template
spec:
containers:
- image: private-docker-registry/clickhouse-server:21.12.3.32.ipv6
name: clickhouse-pod
resources:
requests:
cpu: "4"
memory: 32G
volumeMounts:
- mountPath: /var/lib/clickhouse
name: clickhouse-storage-template
nodeSelector:
robin.io/rnodetype: robin-worker-node
tolerations:
- effect: NoSchedule
key: k8s.sssssss.com/worker
operator: Exists
zone: {}
volumeClaimTemplates:
- metadata:
creationTimestamp: null
name: clickhouse-storage-template
reclaimPolicy: Delete
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
templating:
policy: manual
troubleshoot: "false"
status:
clusters: 0
hosts: 0
replicas: 0
shards: 0
status: ""
pods:
- chi-repl-1s1r-replcluster-0-0-0
replicas: 0
shards: 1
status: Completed
taskID: 9f4049b4-dab5-4168-9198-3b8612a7fc79
taskIDsCompleted:
- 9f4049b4-dab5-4168-9198-3b8612a7fc79
- 2ddcd52d-fbb3-4ab6-9d05-b17d4a9b688d
taskIDsStarted:
- 9f4049b4-dab5-4168-9198-3b8612a7fc79
- 49353693-ef6f-45ae-b24f-3c82f9ca779e
- 567a480c-9668-449d-b144-a26fc850a37d
- 5c9adbee-97b1-43d9-a721-8b4104e4d9d1
- e6039ff5-74c2-415d-b71e-320d62ff158c
- 79008ff8-fbe7-4e04-809a-bb61f29853cf
- b44df750-344a-426a-8e6e-f750044ae19c
- cbe3cb8b-7681-4c9e-a935-2aa1ab02eda7
- 2ddcd52d-fbb3-4ab6-9d05-b17d4a9b688d
- c691d200-6e7a-4140-88ae-a031404b6eca
- b9f98bce-8eb8-41c9-a8c1-34383e08b771
version: 0.18.0
obfuscated some values due to confidential info
chi-repl-1s1r-replcluster-0-0 ClusterIP None
Is not look good, other type: ClusterIP service have ipv6 address
could you share
kubectl get endpoints -n ch1
and
kubectl get svc -n ch1 chi-repl-1s1r-replcluster-0-0 -o yaml
and
kubectl describe svc -n ch1 chi-repl-1s1r-replcluster-0-0
Is not look good, other type: ClusterIP service have ipv6 address Yes, our kubernetes cluster is having only ipv6 protocol
kubectl get endpoints -n ch1
NAME ENDPOINTS AGE chi-repl-1s1r-replcluster-0-0 [xxxx:xxxx:xxx:xxxx:xxxx:xx:0:4e88]:8123,[xxxx:xxxx:xxx:xxxx:xxxx:xx:0:4e88]:9009,[xxxx:xxxx:xxx:xxxx:xxxx:xx:0:4e88]:9000 30h clickhouse-operator-metrics [xxxx:xxxx:xxx:xxxx:xxxx:xx:0:47a1]:8888 16d clickhouse-repl-1s1r [xxxx:xxxx:xxx:xxxx:xxxx:xx:0:4e88]:8123,[xxxx:xxxx:xxx:xxxx:xxxx:xx:0:4e88]:9000 33h
kubectl get svc -n ch1 chi-repl-1s1r-replcluster-0-0 -o yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2022-01-04T07:49:20Z"
labels:
clickhouse.altinity.com/Service: host
clickhouse.altinity.com/app: chop
clickhouse.altinity.com/chi: repl-1s1r
clickhouse.altinity.com/cluster: replcluster
clickhouse.altinity.com/namespace: ch1
clickhouse.altinity.com/object-version: c770326e6ee81e25b1a7b91bb9c9100c00bd7d41
clickhouse.altinity.com/replica: "0"
clickhouse.altinity.com/shard: "0"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
.: {}
f:clickhouse.altinity.com/Service: {}
f:clickhouse.altinity.com/app: {}
f:clickhouse.altinity.com/chi: {}
f:clickhouse.altinity.com/cluster: {}
f:clickhouse.altinity.com/namespace: {}
f:clickhouse.altinity.com/object-version: {}
f:clickhouse.altinity.com/replica: {}
f:clickhouse.altinity.com/shard: {}
f:ownerReferences:
.: {}
k:{"uid":"5432de60-d78b-4bc3-ac71-909ef8ae899b"}:
.: {}
f:apiVersion: {}
f:blockOwnerDeletion: {}
f:controller: {}
f:kind: {}
f:name: {}
f:uid: {}
f:spec:
f:clusterIP: {}
f:ports:
.: {}
k:{"port":8123,"protocol":"TCP"}:
.: {}
f:name: {}
f:port: {}
f:protocol: {}
f:targetPort: {}
k:{"port":9000,"protocol":"TCP"}:
.: {}
f:name: {}
f:port: {}
f:protocol: {}
f:targetPort: {}
k:{"port":9009,"protocol":"TCP"}:
.: {}
f:name: {}
f:port: {}
f:protocol: {}
f:targetPort: {}
f:publishNotReadyAddresses: {}
f:selector:
.: {}
f:clickhouse.altinity.com/app: {}
f:clickhouse.altinity.com/chi: {}
f:clickhouse.altinity.com/cluster: {}
f:clickhouse.altinity.com/namespace: {}
f:clickhouse.altinity.com/replica: {}
f:clickhouse.altinity.com/shard: {}
f:sessionAffinity: {}
f:type: {}
manager: clickhouse-operator
operation: Update
time: "2022-01-04T07:49:20Z"
name: chi-repl-1s1r-replcluster-0-0
namespace: ch1
ownerReferences:
- apiVersion: clickhouse.altinity.com/v1
blockOwnerDeletion: true
controller: true
kind: ClickHouseInstallation
name: repl-1s1r
uid: 5432de60-d78b-4bc3-ac71-909ef8ae899b
resourceVersion: "334136070"
uid: cdb0130e-03c8-449e-ad97-f2504c47efd4
spec:
clusterIP: None
clusterIPs:
- None
ports:
- name: http
port: 8123
protocol: TCP
targetPort: 8123
- name: tcp
port: 9000
protocol: TCP
targetPort: 9000
- name: interserver
port: 9009
protocol: TCP
targetPort: 9009
publishNotReadyAddresses: true
selector:
clickhouse.altinity.com/app: chop
clickhouse.altinity.com/chi: repl-1s1r
clickhouse.altinity.com/cluster: replcluster
clickhouse.altinity.com/namespace: ch1
clickhouse.altinity.com/replica: "0"
clickhouse.altinity.com/shard: "0"
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
chi-repl-1s1r-replcluster-0-0 ClusterIP None
Is not look good, other type: ClusterIP service have ipv6 address
could you share
kubectl get endpoints -n ch1
and
kubectl get svc -n ch1 chi-repl-1s1r-replcluster-0-0 -o yaml
and
kubectl describe svc -n ch1 chi-repl-1s1r-replcluster-0-0
Name: chi-repl-1s1r-replcluster-0-0
Namespace: ch1
Labels: clickhouse.altinity.com/Service=host
clickhouse.altinity.com/app=chop
clickhouse.altinity.com/chi=repl-1s1r
clickhouse.altinity.com/cluster=replcluster
clickhouse.altinity.com/namespace=ch1
clickhouse.altinity.com/object-version=c770326e6ee81e25b1a7b91bb9c9100c00bd7d41
clickhouse.altinity.com/replica=0
clickhouse.altinity.com/shard=0
Annotations: <none>
Selector: clickhouse.altinity.com/app=chop,clickhouse.altinity.com/chi=repl-1s1r,clickhouse.altinity.com/cluster=replcluster,clickhouse.altinity.com/namespace=ch1,clickhouse.altinity.com/replica=0,clickhouse.altinity.com/shard=0
Type: ClusterIP
IP: None
Port: http 8123/TCP
TargetPort: 8123/TCP
Endpoints: [xxxx:xxxx:xxx:xxxx:xxxx:xx:0:4e88]:8123
Port: tcp 9000/TCP
TargetPort: 9000/TCP
Endpoints: [xxxx:xxxx:xxx:xxxx:xxxx:xx:0:4e88]:9000
Port: interserver 9009/TCP
TargetPort: 9009/TCP
Endpoints: [xxxx:xxxx:xxx:xxxx:xxxx:xx:0:4e88]:9009
Session Affinity: None
Events: <none>
Additional info: I updated the clickhouse operator yaml parameter chConfigNetworksHostRegexpTemplate
chConfigNetworksHostRegexpTemplate: "(chi-{chi}-[^.]+\\d+-\\d+|clickhouse\\-{chi})\\.{namespace}\\.svc\\.uhxxxxxx\\.local$"
Because our cluster has specific clusterDomain "uhxxxxxx.local" But the common-configd configmap is still taking cluster.local in the chop-generated-remote_servers.xml section.
Due to which the hostname in the system.clusters is having cluster.local and it is not able to resolve any of them
After upgrade chConfigNetworksHostRegexpTemplate
you should restrart clickhouse-operator deployment manually
it allow us control changes in clickhouse-operator
and re-apply kind: ClickHouse installation manifest (change something in manifest)
Yes I tried it. I created new operator installation with this change chConfigNetworksHostRegexpTemplate but the endpoints are still taking cluster.local
I even added this before deploying default/networks/host_regexp: (chi-repl-1s1r-[^.]+\d+-\d+|clickhouse-repl-1s1r).ch1.svc.uhxxxxxx.local$
$ kubectl get chi repl-1s1r -n ch -oyaml | grep local
{"apiVersion":"clickhouse.altinity.com/v1","kind":"ClickHouseInstallation","metadata":{"annotations":{},"name":"repl-1s1r","namespace":"ch"},"spec":{"configuration":{"clusters":[{"layout":{"replicasCount":2,"shardsCount":2},"name":"replcluster","templates":{"podTemplate":"clickhouse-with-volume-template"}}],"users":{"default/networks/host_regexp":"(chi-repl-1s1r-[^.]+\\d+-\\d+|clickhouse\\-repl-1s1r)\\.ch1\\.svc\\.uhxxxxxxx\\.local$"},"zookeeper":{"nodes":[{"host":"zookeeper-0.zookeepers.ch2.svc.uhxxxxxxx.local","port":2181},{"host":"zookeeper-1.zookeepers.ch2.svc.uhxxxxxxx.local","port":2181},{"host":"zookeeper-2.zookeepers.ch2.svc.uhxxxxxxx.local","port":2181}]}},"defaults":{"distributedDDL":{"profile":"default"},"replicasUseFQDN":"yes"},"templates":{"podTemplates":[{"name":"clickhouse-with-volume-template","spec":{"containers":[{"image":"private-docker-registry/clickhouse-server:21.12.3.32.ipv6","name":"clickhouse-pod","resources":{"requests":{"cpu":2,"memory":"32G"}},"volumeMounts":[{"mountPath":"/var/lib/clickhouse","name":"clickhouse-storage-template"}]}],"nodeSelector":{"robin.io/rnodetype":"robin-worker-node"},"tolerations":[{"effect":"NoSchedule","key":"k8s.ssssssss.com/worker","operator":"Exists"}]}}],"volumeClaimTemplates":[{"name":"clickhouse-storage-template","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"50Gi"}}}}]}}}
default/networks/host_regexp: (chi-repl-1s1r-[^.]+\d+-\d+|clickhouse\-repl-1s1r)\.ch1\.svc\.uhxxxxxxx\.local$
- host: zookeeper-0.zookeepers.ch2.svc.uhxxxxxxx.local
- host: zookeeper-1.zookeepers.ch2.svc.uhxxxxxxx.local
- host: zookeeper-2.zookeepers.ch2.svc.uhxxxxxxx.local
- image: private-docker-registry/clickhouse-server:21.12.3.32.ipv6
endpoint: clickhouse-repl-1s1r.ch.svc.cluster.local
- chi-repl-1s1r-replcluster-0-0.ch.svc.cluster.local
- chi-repl-1s1r-replcluster-0-1.ch.svc.cluster.local
- chi-repl-1s1r-replcluster-1-0.ch.svc.cluster.local
- chi-repl-1s1r-replcluster-1-1.ch.svc.cluster.local
- host: zookeeper-0.zookeepers.ch2.svc.uhxxxxxxx.local
- host: zookeeper-1.zookeepers.ch2.svc.uhxxxxxxx.local
- host: zookeeper-2.zookeepers.ch2.svc.uhxxxxxxx.local
default/networks/host_regexp: (chi-repl-1s1r-[^.]+\d+-\d+|clickhouse\-repl-1s1r)\.ch1\.svc\.uhxxxxxxx\.local$
- host: zookeeper-0.zookeepers.ch2.svc.uhxxxxxxx.local
- host: zookeeper-1.zookeepers.ch2.svc.uhxxxxxxx.local
- host: zookeeper-2.zookeepers.ch2.svc.uhxxxxxxx.local
- image: private-docker-registry/clickhouse-server:21.12.3.32.ipv6
This is the chi file used for the deployment
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "repl-1s1r"
spec:
defaults:
replicasUseFQDN: "yes"
distributedDDL:
profile: default
configuration:
zookeeper:
nodes:
- host: zookeeper-0.zookeepers.ch2.svc.uhxxxxxxx.local
port: 2181
- host: zookeeper-1.zookeepers.ch2.svc.uhxxxxxxx.local
port: 2181
- host: zookeeper-2.zookeepers.ch2.svc.uhxxxxxxx.local
port: 2181
clusters:
- name: replcluster
templates:
podTemplate: clickhouse-with-volume-template
layout:
shardsCount: 2
replicasCount: 2
users:
default/networks/host_regexp: (chi-repl-1s1r-[^.]+\d+-\d+|clickhouse\-repl-1s1r)\.ch1\.svc\.uhxxxxxxx\.local$
templates:
podTemplates:
- name: clickhouse-with-volume-template
spec:
containers:
- name: clickhouse-pod
image: private-docker-registry/clickhouse-server:21.12.3.32.ipv6
volumeMounts:
- name: clickhouse-storage-template
mountPath: /var/lib/clickhouse
resources:
requests:
cpu: 2
memory: 32G
#restartPolicy: Always
nodeSelector:
robin.io/rnodetype: "robin-worker-node"
# robin.io/rnodetype: "robin-master-node"
tolerations:
- key: "k8s.ssssssss.com/worker"
operator: "Exists"
effect: "NoSchedule"
volumeClaimTemplates:
- name: clickhouse-storage-template
spec:
# no storageClassName - means use default storageClassName
#storageClassName: default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
change kind: ClickHouseInstallation
add
spec:
namespaceDomainPattern: "%s.svc.uhxxxxxxx.local"
change
kind: ClickHouseInstallation
add
spec: namespaceDomainPattern: "%s.svc.uhxxxxxxx.local"
I might be looking for this exact configuration for the cluster domain.. I will try it out and update you
I added namespaceDomainPattern in CHI yaml and recreated entire setup from operator to CHI. This parameter has fixed the hostname rendering in system.clusters. However base problem still exists - distributed DDL times out (error log below). based on the logs, it seems like it is still getting the cluster.local from somewhere
SELECT *
FROM system.clusters
Query id: 1f7a32c6-2e1a-4589-8acb-74b5a522ea76
┌─cluster──────────────────────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─────────────────────────────────────────────┬─host_address─────────────────────┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─slowdowns_count─┬─estimated_recovery_time─┐
│ all-replicated │ 1 │ 1 │ 1 │ chi-repl-1s1r-replcluster-0-0.ch1.svc.uhxxxxxxx.local │ 240b:c0e0:104:544d:b464:2:0:484d │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ all-replicated │ 1 │ 1 │ 2 │ chi-repl-1s1r-replcluster-0-1.ch1.svc.uhxxxxxxx.local │ 240b:c0e0:104:544d:b464:2:0:4e0f │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ all-replicated │ 1 │ 1 │ 3 │ chi-repl-1s1r-replcluster-1-0.ch1.svc.uhxxxxxxx.local │ 240b:c0e0:104:544d:b464:2:0:4949 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ all-replicated │ 1 │ 1 │ 4 │ chi-repl-1s1r-replcluster-1-1.ch1.svc.uhxxxxxxx.local │ 240b:c0e0:104:544d:b464:2:0:4883 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ all-sharded │ 1 │ 1 │ 1 │ chi-repl-1s1r-replcluster-0-0.ch1.svc.uhxxxxxxx.local │ 240b:c0e0:104:544d:b464:2:0:484d │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ all-sharded │ 2 │ 1 │ 1 │ chi-repl-1s1r-replcluster-0-1.ch1.svc.uhxxxxxxx.local │ 240b:c0e0:104:544d:b464:2:0:4e0f │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ all-sharded │ 3 │ 1 │ 1 │ chi-repl-1s1r-replcluster-1-0.ch1.svc.uhxxxxxxx.local │ 240b:c0e0:104:544d:b464:2:0:4949 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ all-sharded │ 4 │ 1 │ 1 │ chi-repl-1s1r-replcluster-1-1.ch1.svc.uhxxxxxxx.local │ 240b:c0e0:104:544d:b464:2:0:4883 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ replcluster │ 1 │ 1 │ 1 │ chi-repl-1s1r-replcluster-0-0.ch1.svc.uhxxxxxxx.local │ 240b:c0e0:104:544d:b464:2:0:484d │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ replcluster │ 1 │ 1 │ 2 │ chi-repl-1s1r-replcluster-0-1.ch1.svc.uhxxxxxxx.local │ 240b:c0e0:104:544d:b464:2:0:4e0f │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ replcluster │ 2 │ 1 │ 1 │ chi-repl-1s1r-replcluster-1-0.ch1.svc.uhxxxxxxx.local │ 240b:c0e0:104:544d:b464:2:0:4949 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ replcluster │ 2 │ 1 │ 2 │ chi-repl-1s1r-replcluster-1-1.ch1.svc.uhxxxxxxx.local │ 240b:c0e0:104:544d:b464:2:0:4883 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ test_cluster_two_shards │ 1 │ 1 │ 1 │ 127.0.0.1 │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ test_cluster_two_shards │ 2 │ 1 │ 1 │ 127.0.0.2 │ 127.0.0.2 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ test_cluster_two_shards_internal_replication │ 1 │ 1 │ 1 │ 127.0.0.1 │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ test_cluster_two_shards_internal_replication │ 2 │ 1 │ 1 │ 127.0.0.2 │ 127.0.0.2 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ test_cluster_two_shards_localhost │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ test_cluster_two_shards_localhost │ 2 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ test_shard_localhost │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ test_shard_localhost_secure │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9440 │ 0 │ default │ │ 0 │ 0 │ 0 │
│ test_unavailable_shard │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │
│ test_unavailable_shard │ 2 │ 1 │ 1 │ localhost │ ::1 │ 1 │ 0 │ default │ │ 0 │ 0 │ 0 │
└──────────────────────────────────────────────┴───────────┴──────────────┴─────────────┴───────────────────────────────────────────────────────┴──────────────────────────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────┴─────────────────────────┘
22 rows in set. Elapsed: 0.002 sec.
$ kubectl logs pod/chi-repl-1s1r-replcluster-0-0-0 -n ch1 | tail -100 | grep DNS
2022.01.06 19:39:51.969738 [ 246 ] {} <Error> DNSResolver: Cannot resolve host (chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local), error 0: chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local.
2022.01.06 19:39:51.969962 [ 246 ] {} <Error> DDLWorker: Unexpected error, will try to restart main thread:: Code: 198. DB::Exception: Not found address of host: chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local. (DNS_ERROR), Stack trace (when copying this message, always include the lines below):
3. DB::DNSResolver::resolveAddress(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short) @ 0xa2c8ea3 in /usr/bin/clickhouse
2022.01.06 19:39:56.986005 [ 246 ] {} <Error> DNSResolver: Cannot resolve host (chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local), error 0: chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local.
2022.01.06 19:39:56.986224 [ 246 ] {} <Error> DDLWorker: Unexpected error, will try to restart main thread:: Code: 198. DB::Exception: Not found address of host: chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local. (DNS_ERROR), Stack trace (when copying this message, always include the lines below):
3. DB::DNSResolver::resolveAddress(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short) @ 0xa2c8ea3 in /usr/bin/clickhouse
(version 21.12.3.32 (official build))
2022.01.06 19:59:21.319037 [ 172 ] {} <Debug> DNSResolver: Updated DNS cache
2022.01.06 19:59:21.661099 [ 108 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 46.48 GiB.
2022.01.06 19:59:21.690797 [ 111 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 46.48 GiB.
2022.01.06 19:59:22.253401 [ 110 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 46.48 GiB.
2022.01.06 19:59:22.936695 [ 85 ] {} <Debug> system.session_log (75cebf14-fe4e-4992-b5ce-bf14fe4e4992): Removing part from filesystem 202201_1_105_21
2022.01.06 19:59:22.937126 [ 85 ] {} <Debug> system.session_log (75cebf14-fe4e-4992-b5ce-bf14fe4e4992): Removing part from filesystem 202201_106_106_0
2022.01.06 19:59:22.937385 [ 85 ] {} <Debug> system.session_log (75cebf14-fe4e-4992-b5ce-bf14fe4e4992): Removing part from filesystem 202201_107_107_0
2022.01.06 19:59:22.937632 [ 85 ] {} <Debug> system.session_log (75cebf14-fe4e-4992-b5ce-bf14fe4e4992): Removing part from filesystem 202201_108_108_0
2022.01.06 19:59:22.937854 [ 85 ] {} <Debug> system.session_log (75cebf14-fe4e-4992-b5ce-bf14fe4e4992): Removing part from filesystem 202201_109_109_0
2022.01.06 19:59:22.938103 [ 85 ] {} <Debug> system.session_log (75cebf14-fe4e-4992-b5ce-bf14fe4e4992): Removing part from filesystem 202201_110_110_0
2022.01.06 19:59:24.572079 [ 106 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 46.48 GiB.
2022.01.06 19:59:26.237852 [ 99 ] {02d4001d-75e8-46a7-99a7-cba58f5ba31e} <Error> executeQuery: Code: 159. DB::Exception: Watching task /clickhouse/repl-1s1r/task_queue/ddl/query-0000000013 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 4 unfinished hosts (0 of them are currently active), they are going to execute the query in background. (TIMEOUT_EXCEEDED) (version 21.12.3.32 (official build)) (from [::1]:35892) (in query: CREATE DATABASE test ON CLUSTER '{cluster}';), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xa21959a in /usr/bin/clickhouse
1. DB::Exception::Exception<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, long&, unsigned long&, unsigned long&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, long&, unsigned long&, unsigned long&) @ 0x13526564 in /usr/bin/clickhouse
2. ? @ 0x13522da5 in /usr/bin/clickhouse
3. DB::DDLQueryStatusSource::generate() @ 0x1352121e in /usr/bin/clickhouse
4. DB::ISource::tryGenerate() @ 0x14024515 in /usr/bin/clickhouse
5. DB::ISource::work() @ 0x140240da in /usr/bin/clickhouse
6. DB::SourceWithProgress::work() @ 0x1423d742 in /usr/bin/clickhouse
7. DB::ExecutionThreadContext::executeTask() @ 0x14043ae3 in /usr/bin/clickhouse
8. DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*) @ 0x1403835e in /usr/bin/clickhouse
9. DB::PipelineExecutor::executeImpl(unsigned long) @ 0x140371a9 in /usr/bin/clickhouse
10. DB::PipelineExecutor::execute(unsigned long) @ 0x14036eb8 in /usr/bin/clickhouse
11. ? @ 0x14047607 in /usr/bin/clickhouse
12. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xa25a3b7 in /usr/bin/clickhouse
13. ? @ 0xa25ddbd in /usr/bin/clickhouse
14. ? @ 0x7f273cf8e609 in ?
15. clone @ 0x7f273ceb5293 in ?
2022.01.06 19:59:26.237956 [ 99 ] {02d4001d-75e8-46a7-99a7-cba58f5ba31e} <Error> TCPHandler: Code: 159. DB::Exception: Watching task /clickhouse/repl-1s1r/task_queue/ddl/query-0000000013 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 4 unfinished hosts (0 of them are currently active), they are going to execute the query in background. (TIMEOUT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xa21959a in /usr/bin/clickhouse
1. DB::Exception::Exception<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, long&, unsigned long&, unsigned long&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, long&, unsigned long&, unsigned long&) @ 0x13526564 in /usr/bin/clickhouse
2. ? @ 0x13522da5 in /usr/bin/clickhouse
3. DB::DDLQueryStatusSource::generate() @ 0x1352121e in /usr/bin/clickhouse
4. DB::ISource::tryGenerate() @ 0x14024515 in /usr/bin/clickhouse
5. DB::ISource::work() @ 0x140240da in /usr/bin/clickhouse
6. DB::SourceWithProgress::work() @ 0x1423d742 in /usr/bin/clickhouse
7. DB::ExecutionThreadContext::executeTask() @ 0x14043ae3 in /usr/bin/clickhouse
8. DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*) @ 0x1403835e in /usr/bin/clickhouse
9. DB::PipelineExecutor::executeImpl(unsigned long) @ 0x140371a9 in /usr/bin/clickhouse
10. DB::PipelineExecutor::execute(unsigned long) @ 0x14036eb8 in /usr/bin/clickhouse
11. ? @ 0x14047607 in /usr/bin/clickhouse
12. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xa25a3b7 in /usr/bin/clickhouse
13. ? @ 0xa25ddbd in /usr/bin/clickhouse
14. ? @ 0x7f273cf8e609 in ?
15. clone @ 0x7f273ceb5293 in ?
2022.01.06 19:59:26.238093 [ 99 ] {02d4001d-75e8-46a7-99a7-cba58f5ba31e} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2022.01.06 19:59:26.238133 [ 99 ] {} <Debug> TCPHandler: Processed in 180.760750757 sec.
2022.01.06 19:59:26.299626 [ 246 ] {} <Debug> DDLWorker: Initializing DDLWorker thread
2022.01.06 19:59:26.307408 [ 246 ] {} <Debug> DDLWorker: Initialized DDLWorker thread
2022.01.06 19:59:26.307555 [ 246 ] {} <Debug> DDLWorker: Scheduling tasks
2022.01.06 19:59:26.308295 [ 246 ] {} <Debug> DDLWorker: Will schedule 14 tasks starting from query-0000000000
2022.01.06 19:59:26.312075 [ 246 ] {} <Debug> DDLWorker: Will not execute task query-0000000000: Task has been already processed
2022.01.06 19:59:26.321915 [ 246 ] {} <Error> DNSResolver: Cannot resolve host (chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local), error 0: chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local.
2022.01.06 19:59:26.322200 [ 246 ] {} <Error> DDLWorker: Unexpected error, will try to restart main thread:: Code: 198. DB::Exception: Not found address of host: chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local. (DNS_ERROR), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xa21959a in /usr/bin/clickhouse
1. ? @ 0xa2c77d1 in /usr/bin/clickhouse
2. ? @ 0xa2c7fa2 in /usr/bin/clickhouse
3. DB::DNSResolver::resolveAddress(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short) @ 0xa2c8ea3 in /usr/bin/clickhouse
4. DB::HostID::isLocalAddress(unsigned short) const @ 0x12cff40b in /usr/bin/clickhouse
5. DB::DDLTask::findCurrentHostID(std::__1::shared_ptr<DB::Context const>, Poco::Logger*) @ 0x12d01f81 in /usr/bin/clickhouse
6. DB::DDLWorker::initAndCheckTask(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, std::__1::shared_ptr<zkutil::ZooKeeper> const&) @ 0x12d0ba46 in /usr/bin/clickhouse
7. DB::DDLWorker::scheduleTasks(bool) @ 0x12d0f106 in /usr/bin/clickhouse
8. DB::DDLWorker::runMainThread() @ 0x12d091e5 in /usr/bin/clickhouse
9. ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::DDLWorker::*)(), DB::DDLWorker*>(void (DB::DDLWorker::*&&)(), DB::DDLWorker*&&)::'lambda'()::operator()() @ 0x12d1d9d7 in /usr/bin/clickhouse
10. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xa25a3b7 in /usr/bin/clickhouse
11. ? @ 0xa25ddbd in /usr/bin/clickhouse
12. ? @ 0x7f273cf8e609 in ?
13. clone @ 0x7f273ceb5293 in ?
(version 21.12.3.32 (official build))
2022.01.06 19:59:26.322243 [ 246 ] {} <Information> DDLWorker: Cleaned DDLWorker state
2022.01.06 19:59:28.727337 [ 102 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 46.48 GiB.
2022.01.06 19:59:29.193515 [ 111 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 46.48 GiB.
2022.01.06 19:59:29.260263 [ 110 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 46.48 GiB.
2022.01.06 19:59:31.322339 [ 246 ] {} <Debug> DDLWorker: Initializing DDLWorker thread
2022.01.06 19:59:31.330814 [ 246 ] {} <Debug> DDLWorker: Initialized DDLWorker thread
2022.01.06 19:59:31.330851 [ 246 ] {} <Debug> DDLWorker: Scheduling tasks
2022.01.06 19:59:31.331512 [ 246 ] {} <Debug> DDLWorker: Will schedule 14 tasks starting from query-0000000000
2022.01.06 19:59:31.334344 [ 246 ] {} <Debug> DDLWorker: Will not execute task query-0000000000: Task has been already processed
2022.01.06 19:59:31.343967 [ 246 ] {} <Error> DNSResolver: Cannot resolve host (chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local), error 0: chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local.
2022.01.06 19:59:31.344184 [ 246 ] {} <Error> DDLWorker: Unexpected error, will try to restart main thread:: Code: 198. DB::Exception: Not found address of host: chi-repl-1s1r-replcluster-0-0.ch1.svc.cluster.local. (DNS_ERROR), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xa21959a in /usr/bin/clickhouse
1. ? @ 0xa2c77d1 in /usr/bin/clickhouse
2. ? @ 0xa2c7fa2 in /usr/bin/clickhouse
3. DB::DNSResolver::resolveAddress(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned short) @ 0xa2c8ea3 in /usr/bin/clickhouse
4. DB::HostID::isLocalAddress(unsigned short) const @ 0x12cff40b in /usr/bin/clickhouse
5. DB::DDLTask::findCurrentHostID(std::__1::shared_ptr<DB::Context const>, Poco::Logger*) @ 0x12d01f81 in /usr/bin/clickhouse
6. DB::DDLWorker::initAndCheckTask(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, std::__1::shared_ptr<zkutil::ZooKeeper> const&) @ 0x12d0ba46 in /usr/bin/clickhouse
7. DB::DDLWorker::scheduleTasks(bool) @ 0x12d0f106 in /usr/bin/clickhouse
8. DB::DDLWorker::runMainThread() @ 0x12d091e5 in /usr/bin/clickhouse
9. ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::DDLWorker::*)(), DB::DDLWorker*>(void (DB::DDLWorker::*&&)(), DB::DDLWorker*&&)::'lambda'()::operator()() @ 0x12d1d9d7 in /usr/bin/clickhouse
10. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xa25a3b7 in /usr/bin/clickhouse
11. ? @ 0xa25ddbd in /usr/bin/clickhouse
12. ? @ 0x7f273cf8e609 in ?
13. clone @ 0x7f273ceb5293 in ?
(version 21.12.3.32 (official build))
Did you destroy / create zookeeper after re-install your CHI manifest?
I deleted zookeeper pods after reinstallation of chi manifest. Statefulset recreated zookeeper pods However clickhouse client has still the same error
any updates?
Any updates here ? I'm facing the same problem
@SolydBoy which problem? do you have custom cluster domain? or you just can't run ON CLUSTER queries in clickhouse-operator managed clickhouse?
@SolydBoy @ragsarang Facing the same issue. . Did you get this fixed by any chance ?
@Mizoguchee could you explain which problem do you have?
@Slach I have done 3 shard cluster each shard has 2 replica It was working when I setup initially. Now i had to redo these instances but I'm stuck now
CREATE DATABASE IF NOT EXISTS newDB ON CLUSTER clickhouse_cluster;
Received exception from server (version 24.8.2): Code: 159. DB::Exception: Received from localhost:9000. DB::Exception: Distributed DDL task /clickhouse/task_queue/ddl/query-0000000009 is not finished on 3 of 6 hosts (0 of them are currently executing the task, 0 are inactive). They are going to execute the query in background. Was waiting for 900.089174443 seconds, which is longer than distributed_ddl_task_timeout. (TIMEOUT_EXCEEDED)
Increased thew timeout 900 on all three servers but no luck
The weird case is it works Each shard has a single replica, and all replicas are on port 9000 for their respective hosts. Whereas it times out when each shard has multiple replicas and replicas are on different ports like 9000 and 9999 on each server.
I Just dont know what happened.. It was working find the day when i configured the multiple shard and replicas
Below is my current setup
<remote_servers>
<clickhouse_cluster>
<shard>
<replica>
<host>100.10.0.51</host>
<port>9000</port>
</replica>
<replica>
<host>100.10.0.207</host>
<port>9999</port>
</replica>
</shard>
<shard>
<replica>
<host>100.10.0.207</host>
<port>9000</port>
</replica>
<replica>
<host>100.10.0.200</host>
<port>9999</port>
</replica>
</shard>
<shard>
<replica>
<host>100.10.0.200</host>
<port>9000</port>
</replica>
<replica>
<host>100.10.0.51</host>
<port>9999</port>
</replica>
</shard>
</clickhouse_cluster>
</remote_servers>
@Mizoguchee are you sure your clickhouse managed by clickhouse-operator
could share kubectl get chi -n <your-namespace> <your-chi-name> -o yaml
?
@Slach Sorry Im using it in VM. Not K8s Below is my sample clickhouse keeper config
@Mizoguchee this repository about clickhouse-operator
not about common question itself
check system.clusters
table in each of 6 hosts
your cluster should contains is_local=1 for your host
Yes it shows 1 for each node.. its not 6 hosts , its three hosts with two ports open. One port for Replica1 One port for Replica2
Describe the unexpected behaviour When I execute a statement with ON CLUSTER, It should be able to run it on all the shards and replicas. However it is timing out.
How to reproduce
CREATE TABLE
statements for all tables involved:CREATE DATABASE test ON CLUSTER '{cluster}' ;
Expected behavior The statement should be executed and database/ table should be created successfully
Error message and/or stacktrace
Additional context https://github.com/ClickHouse/ClickHouse/issues/33019