bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
9.05k stars 9.22k forks source link

[bitnami/clickhouse] Connection to Clickhouse-Keeper broken/unresponsive #15935

Open marcleibold opened 1 year ago

marcleibold commented 1 year ago

Name and Version

bitnami/clickhouse 3.1.5

What architecture are you using?

amd64

What steps will reproduce the bug?

  1. In a GKE (Google Kubernetes Engine) Cluster
  2. With the attached values.yaml
  3. Apply with Terraform (shouldn't be any different than standard helm)

Result: Pods are running without any suspicious logs, but when you either exec into them or execute some command from the web UI, which is executed "ON CLUSTER", the progress indicator never goes past 49%. This was tried with a CREATE TABLE statement trying to create a ReplicatedMergeTree on the cluster. The Clickhouse cluster consists of 2 shards and 2 replicas.

Are you using any custom parameters or values?

Our values.yaml:

fullnameOverride: clickhouse-replicated

# ClickHouse Parameters

image:
  registry: docker.io
  repository: bitnami/clickhouse
  tag: "23-debian-11"
  pullPolicy: IfNotPresent

shards: ${CLICKHOUSE_SHARDS_COUNT}
replicaCount: ${CLICKHOUSE_REPLICAS_COUNT}

containerPorts:
  http: 8123
  https: 8443
  tcp: 9000
  tcpSecure: 9440
  keeper: 2181
  keeperSecure: 3181
  keeperInter: 9444
  mysql: 9004
  postgresql: 9005
  interserver: 9009
  metrics: 8001

auth:
  username: clickhouse_operator
  password: "${CLICKHOUSE_PASSWORD}"

logLevel: trace

keeper:
  enabled: true

zookeeper:
  enabled: false

defaultConfigurationOverrides: |
  <clickhouse>
    <!-- Macros -->
    <macros>
      <shard from_env="CLICKHOUSE_SHARD_ID"></shard>
      <replica from_env="CLICKHOUSE_REPLICA_ID"></replica>
      <layer>{{ include "common.names.fullname" . }}</layer>
    </macros>
    <!-- Log Level -->
    <logger>
      <level>{{ .Values.logLevel }}</level>
    </logger>
    {{- if or (ne (int .Values.shards) 1) (ne (int .Values.replicaCount) 1)}}
    <!-- Cluster configuration - Any update of the shards and replicas requires helm upgrade -->
    <remote_servers>
      <default>
        {{- $shards := $.Values.shards | int }}
        {{- range $shard, $e := until $shards }}
        <shard>
            <internal_replication>true</internal_replication>
            {{- $replicas := $.Values.replicaCount | int }}
            {{- range $i, $_e := until $replicas }}
            <replica>
                <host>{{ printf "%s-shard%d-%d.%s.%s.svc.%s" (include "common.names.fullname" $ ) $shard $i (include "clickhouse.headlessServiceName" $) (include "common.names.namespace" $) $.Values.clusterDomain }}</host>
                <port>{{ $.Values.service.ports.tcp }}</port>
            </replica>
            {{- end }}
        </shard>
        {{- end }}
      </default>
    </remote_servers>
    {{- end }}
    {{- if .Values.keeper.enabled }}
    <!-- keeper configuration -->
    <keeper_server>
      {{/*ClickHouse keeper configuration using the helm chart */}}
      <tcp_port>{{ $.Values.containerPorts.keeper }}</tcp_port>
      {{- if .Values.tls.enabled }}
      <tcp_port_secure>{{ $.Values.containerPorts.keeperSecure }}</tcp_port_secure>
      {{- end }}
      <server_id from_env="KEEPER_SERVER_ID"></server_id>
      <log_storage_path>/bitnami/clickhouse/keeper/coordination/log</log_storage_path>
      <snapshot_storage_path>/bitnami/clickhouse/keeper/coordination/snapshots</snapshot_storage_path>
      <coordination_settings>
          <operation_timeout_ms>10000</operation_timeout_ms>
          <session_timeout_ms>30000</session_timeout_ms>
          <raft_logs_level>trace</raft_logs_level>
      </coordination_settings>
      <raft_configuration>
      {{- $nodes := .Values.replicaCount | int }}
      {{- range $node, $e := until $nodes }}
      <server>
        <id>{{ $node | int }}</id>
        <hostname from_env="{{ printf "KEEPER_NODE_%d" $node }}"></hostname>
        <port>{{ $.Values.service.ports.keeperInter }}</port>
      </server>
      {{- end }}
      </raft_configuration>
    </keeper_server>
    {{- end }}
    {{- if or .Values.keeper.enabled .Values.zookeeper.enabled .Values.externalZookeeper.servers }}
    <!-- Zookeeper configuration -->
    <zookeeper>
      {{- if or .Values.keeper.enabled }}
      {{- $nodes := .Values.replicaCount | int }}
      {{- range $node, $e := until $nodes }}
      <node>
        <host from_env="{{ printf "KEEPER_NODE_%d" $node }}"></host>
        <port>{{ $.Values.service.ports.keeper }}</port>
      </node>
      {{- end }}
      {{- else if .Values.zookeeper.enabled }}
      {{/* Zookeeper configuration using the helm chart */}}
      {{- $nodes := .Values.zookeeper.replicaCount | int }}
      {{- range $node, $e := until $nodes }}
      <node>
        <host from_env="{{ printf "KEEPER_NODE_%d" $node }}"></host>
        <port>{{ $.Values.zookeeper.service.ports.client }}</port>
      </node>
      {{- end }}
      {{- else if .Values.externalZookeeper.servers }}
      {{/* Zookeeper configuration using an external instance */}}
      {{- range $node :=.Values.externalZookeeper.servers }}
      <node>
        <host>{{ $node }}</host>
        <port>{{ $.Values.externalZookeeper.port }}</port>
      </node>
      {{- end }}
      {{- end }}
    </zookeeper>
    {{- end }}
    <distributed_ddl>
        <path>/clickhouse/task_queue/ddl</path>
    </distributed_ddl>
    {{- if .Values.tls.enabled }}
    <!-- TLS configuration -->
    <tcp_port_secure from_env="CLICKHOUSE_TCP_SECURE_PORT"></tcp_port_secure>
    <https_port from_env="CLICKHOUSE_HTTPS_PORT"></https_port>
    <openSSL>
        <server>
            {{- $certFileName := default "tls.crt" .Values.tls.certFilename }}
            {{- $keyFileName := default "tls.key" .Values.tls.certKeyFilename }}
            <certificateFile>/bitnami/clickhouse/certs/{{$certFileName}}</certificateFile>
            <privateKeyFile>/bitnami/clickhouse/certs/{{$keyFileName}}</privateKeyFile>
            <verificationMode>none</verificationMode>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
            {{- if or .Values.tls.autoGenerated .Values.tls.certCAFilename }}
            {{- $caFileName := default "ca.crt" .Values.tls.certCAFilename }}
            <caConfig>/bitnami/clickhouse/certs/{{$caFileName}}</caConfig>
            {{- else }}
            <loadDefaultCAFile>true</loadDefaultCAFile>
            {{- end }}
        </server>
        <client>
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
            <verificationMode>none</verificationMode>
            <invalidCertificateHandler>
                <name>AcceptCertificateHandler</name>
            </invalidCertificateHandler>
        </client>
    </openSSL>
    {{- end }}
    {{- if .Values.metrics.enabled }}
     <!-- Prometheus metrics -->
     <prometheus>
        <endpoint>/metrics</endpoint>
        <port from_env="CLICKHOUSE_METRICS_PORT"></port>
        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>true</asynchronous_metrics>
    </prometheus>
    {{- end }}
    <profiles>
      <default>
        <distributed_ddl_task_timeout>900</distributed_ddl_task_timeout>
      </default>
    </profiles>
  </clickhouse>

extraVolumes:
  - name: clickhouse-client-config
    configMap:
      name: clickhouse-client-config

extraVolumeMounts:
  - name: clickhouse-client-config
    mountPath: /etc/clickhouse-client/

initdbScripts:
  create_bigtable.sh: |
    <init script (not working)>

# TLS configuration

tls:
  enabled: true
  autoGenerated: false
  certificatesSecret: clickhouse-tls-secret
  certFilename: tls.crt
  certKeyFilename: tls.key
  certCAFilename: ca.crt

# Traffic Exposure Parameters

## ClickHouse service parameters

## http: ClickHouse service HTTP port
## https: ClickHouse service HTTPS port
## tcp: ClickHouse service TCP port
## tcpSecure: ClickHouse service TCP (secure) port
## keeper: ClickHouse keeper TCP container port
## keeperSecure: ClickHouse keeper TCP (secure) container port
## keeperInter: ClickHouse keeper interserver TCP container port
## mysql: ClickHouse service MySQL port
## postgresql: ClickHouse service PostgreSQL port
## interserver: ClickHouse service Interserver port
## metrics: ClickHouse service metrics port

service:
  type: LoadBalancer  
  ports:
    https: 443
  loadBalancerIP: "${LOAD_BALANCER_IP}"

## Persistence Parameters

persistence:
  enabled: true
  accessModes:
    - ReadWriteOnce
  size: ${CLICKHOUSE_DATA_VOLUME_SIZE}

## Prometheus metrics

metrics:
  enabled: true
  podAnnotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "{{ .Values.containerPorts.metrics }}"

serviceAccount:
  create: true

What is the expected behavior?

The expected behaviour is normal creation of the tables within the distributed_ddl_task_timeout

What do you see instead?

The table creation (tested with the clickhouse-client command after exec-ing into the pod) is stuck at 49% progress.

CREATE TABLE logs_replicated ON CLUSTER default
(
    `gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192

Query id: 2abb72cc-3a48-4416-8e46-f9edbf219463

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   3 │                1 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   2 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
← Progress: 2.00 rows, 262.00 B (0.07 rows/s., 9.19 B/s.)  49%

When aborted, the table seems to have been created.

SHOW TABLES

Query id: 8be31ab0-5c0c-40a2-a733-8c5bcb57f35f

┌─name────────────┐
│ logs_replicated │
└─────────────────┘

1 row in set. Elapsed: 0.002 sec.

When trying to drop the tables, the same problem occurs:

DROP TABLE logs_replicated ON CLUSTER default

Query id: b0112235-fdca-4334-b116-b68a451e8dba

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   2 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
↗ Progress: 2.00 rows, 262.00 B (0.23 rows/s., 30.60 B/s.)  49%

The tables seem to have been created, but the command doesn't finish, therefore I believe Clickhouse-Keeper doesn't answer the command, but executes it.

When trying to create the table again, because I assumed Keeper executed the last command, the command tells me, that the replica already exists, not the table itself. So the problem seems to lay somewhere with the replicas

CREATE TABLE logs_replicated ON CLUSTER default
(
    `gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192

Query id: c83fb364-73fa-4dfb-b419-8581628c97fb

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │    253 │ Code: 253. DB::Exception: Replica /clickhouse/tables/shard1/default/logs_replicated/replicas/clickhouse-replicated-shard1-0 already exists. (REPLICA_ALREADY_EXISTS) (version 23.3.1.2823 (official build)) │                   3 │                0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │    253 │ Code: 253. DB::Exception: Replica /clickhouse/tables/shard1/default/logs_replicated/replicas/clickhouse-replicated-shard1-1 already exists. (REPLICA_ALREADY_EXISTS) (version 23.3.1.2823 (official build)) │                   2 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────┴──────────────────┘
↓ Progress: 2.00 rows, 668.00 B (0.09 rows/s., 31.49 B/s.)  49%

Additional information

No response

javsalgar commented 1 year ago

Hi,

Does the issue happen when using the zookeeper included in the chart? Just to pin-point where the issue could be

marcleibold commented 1 year ago

Hi,

I have it configured like this now

keeper:
   enabled: false
zookeeper:
   enabled: true
   replicaCount: 3

And now the command just completes normally

CREATE TABLE logs_replicated ON CLUSTER default
(
    `gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192

Query id: 0c7dd092-a396-4fe6-9ca9-0001a867c370

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard0-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse-replicated-shard0-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   2 │                0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   0 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

4 rows in set. Elapsed: 0.321 sec.
fmulero commented 1 year ago

Thanks @marcleibold for letting us know. Have you faced the issue with the default defaultConfigurationOverrides? Have you changed that value when moved on to zookeeper?

marcleibold commented 1 year ago

Hi @fmulero , I did not change anything when I tried it out with zookeeper, so the defaultConfigurationOverrides were still the same as described above. And when I now try to remove the defaultConfigurationOverrides from the values.yaml completely and try the CREATE TABLE command again, it is again stuck on 49%

CREATE TABLE logs_replicated ON CLUSTER default
(
    `gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192

Query id: 3cbb9139-279a-4853-9038-a2208a08444a

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   2 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
↖ Progress: 2.00 rows, 262.00 B (0.21 rows/s., 27.98 B/s.)  49%
fmulero commented 1 year ago

Hi @marcleibold

I've reproduced the same issue in a simpler scenario, just enabling keeper:

helm install myrelease bitnami/clickhouse --set keeper.enabled=true --set zookeeper.enabled=false

I've checked keeper status and it seems there is no active clients (10.42.1.26 is the ip of my pod).

$ echo stat | nc localhost 2181
ClickHouse Keeper version: v23.3.1.2823-testing-46e85357ce2da2a99f56ee83a079e892d7ec3726
Clients:
 10.42.1.26:45740(recved=0,sent=0)
 10.42.1.26:49358(recved=5005,sent=5006)

Latency min/avg/max: 0/0/6
Received: 5005
Sent: 5006
Connections: 1
Outstanding: 0
Zxid: 961
Mode: follower
Node count: 80

It seems something is misconfigured about keeper. I need a further investigation, please bear with us.

roberthorn commented 1 year ago

I think the issue may be here, I don't think KEEPER_SERVER_ID is actually set anywhere

marcleibold commented 1 year ago

It seems like that is the issue. I also do not see the KEEPER_SERVER_ID when I run set in one of the containers

I have no name!@clickhouse-replicated-shard1-0:/$ set
APP_VERSION=23.3.1
BASH=/bin/bash
BASHOPTS=checkwinsize:cmdhist:complete_fullquote:expand_aliases:extquote:force_fignore:globasciiranges:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=([0]="0")
BASH_ARGV=()
BASH_CMDS=()
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="5" [1]="1" [2]="4" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
BASH_VERSION='5.1.4(1)-release'
BITNAMI_APP_NAME=clickhouse
BITNAMI_DEBUG=false
CLICKHOUSE_ADMIN_PASSWORD=<redacted>
CLICKHOUSE_ADMIN_USER=<redacted>
CLICKHOUSE_HTTPS_PORT=8443
CLICKHOUSE_HTTP_PORT=8123
CLICKHOUSE_INTERSERVER_HTTP_PORT=9009
CLICKHOUSE_KEEPER_INTER_PORT=9444
CLICKHOUSE_KEEPER_PORT=2181
CLICKHOUSE_KEEPER_SECURE_PORT=3181
CLICKHOUSE_METRICS_PORT=8001
CLICKHOUSE_MYSQL_PORT=9004
CLICKHOUSE_POSTGRESQL_PORT=9005
CLICKHOUSE_REPLICATED_PORT=tcp://10.0.46.111:8123
CLICKHOUSE_REPLICATED_PORT_2181_TCP=tcp://10.0.46.111:2181
CLICKHOUSE_REPLICATED_PORT_2181_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_2181_TCP_PORT=2181
CLICKHOUSE_REPLICATED_PORT_2181_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_3181_TCP=tcp://10.0.46.111:3181
CLICKHOUSE_REPLICATED_PORT_3181_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_3181_TCP_PORT=3181
CLICKHOUSE_REPLICATED_PORT_3181_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_443_TCP=tcp://10.0.46.111:443
CLICKHOUSE_REPLICATED_PORT_443_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_443_TCP_PORT=443
CLICKHOUSE_REPLICATED_PORT_443_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_8001_TCP=tcp://10.0.46.111:8001
CLICKHOUSE_REPLICATED_PORT_8001_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_8001_TCP_PORT=8001
CLICKHOUSE_REPLICATED_PORT_8001_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_8123_TCP=tcp://10.0.46.111:8123
CLICKHOUSE_REPLICATED_PORT_8123_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_8123_TCP_PORT=8123
CLICKHOUSE_REPLICATED_PORT_8123_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9000_TCP=tcp://10.0.46.111:9000
CLICKHOUSE_REPLICATED_PORT_9000_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9000_TCP_PORT=9000
CLICKHOUSE_REPLICATED_PORT_9000_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9004_TCP=tcp://10.0.46.111:9004
CLICKHOUSE_REPLICATED_PORT_9004_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9004_TCP_PORT=9004
CLICKHOUSE_REPLICATED_PORT_9004_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9005_TCP=tcp://10.0.46.111:9005
CLICKHOUSE_REPLICATED_PORT_9005_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9005_TCP_PORT=9005
CLICKHOUSE_REPLICATED_PORT_9005_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9009_TCP=tcp://10.0.46.111:9009
CLICKHOUSE_REPLICATED_PORT_9009_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9009_TCP_PORT=9009
CLICKHOUSE_REPLICATED_PORT_9009_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9440_TCP=tcp://10.0.46.111:9440
CLICKHOUSE_REPLICATED_PORT_9440_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9440_TCP_PORT=9440
CLICKHOUSE_REPLICATED_PORT_9440_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9444_TCP=tcp://10.0.46.111:9444
CLICKHOUSE_REPLICATED_PORT_9444_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9444_TCP_PORT=9444
CLICKHOUSE_REPLICATED_PORT_9444_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_SERVICE_HOST=10.0.46.111
CLICKHOUSE_REPLICATED_SERVICE_PORT=8123
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTP=8123
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTPS=443
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTP_INTERSRV=9009
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTP_METRICS=8001
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP=9000
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_KEEPER=2181
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_KEEPERINTER=9444
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_KEEPERTLS=3181
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_MYSQL=9004
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_POSTGRESQL=9005
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_SECURE=9440
CLICKHOUSE_REPLICA_ID=clickhouse-replicated-shard1-0
CLICKHOUSE_SHARD_ID=shard1
CLICKHOUSE_TCP_PORT=9000
CLICKHOUSE_TCP_SECURE_PORT=9440
CLICKHOUSE_TLS_CA_FILE=/opt/bitnami/clickhouse/certs/ca.crt
CLICKHOUSE_TLS_CERT_FILE=/opt/bitnami/clickhouse/certs/tls.crt
CLICKHOUSE_TLS_KEY_FILE=/opt/bitnami/clickhouse/certs/tls.key
COLUMNS=155
DIRSTACK=()
EUID=1001
GROUPS=()
HISTFILE=//.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/
HOSTNAME=clickhouse-replicated-shard1-0
HOSTTYPE=x86_64
IFS=$' \t\n'
KEEPER_NODE_0=clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local
KEEPER_NODE_1=clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local
KUBERNETES_PORT=tcp://10.0.32.1:443
KUBERNETES_PORT_443_TCP=tcp://10.0.32.1:443
KUBERNETES_PORT_443_TCP_ADDR=10.0.32.1
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_HOST=10.0.32.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
LINES=17
MACHTYPE=x86_64-pc-linux-gnu
MAILCHECK=60
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
OS_ARCH=amd64
OS_FLAVOUR=debian-11
OS_NAME=linux
PATH=/opt/bitnami/common/bin:/opt/bitnami/clickhouse/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PIPESTATUS=([0]="1")
PPID=0
PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
PS2='> '
PS4='+ '
PWD=/
SHELL=/bin/sh
SHELLOPTS=braceexpand:emacs:hashall:histexpand:history:interactive-comments:monitor
SHLVL=1
TERM=xterm
UID=1001
_=']'
clickhouseCTL_API=3
marcleibold commented 1 year ago

Although the variable should be set in this script.

The line also works completely fine as I just tested inside of my container:

I have no name!@clickhouse-replicated-shard1-0:/$ echo $KEEPER_SERVER_ID

I have no name!@clickhouse-replicated-shard1-0:/$ if [[ -f "/bitnami/clickhouse/keeper/data/myid" ]]; then
    export KEEPER_SERVER_ID="$(cat /bitnami/clickhouse/keeper/data/myid)"
else
    HOSTNAME="$(hostname -s)"
    if [[ $HOSTNAME =~ (.*)-([0-9]+)$ ]]; then
        export KEEPER_SERVER_ID=${BASH_REMATCH[2]}
    else
        echo "Failed to get index from hostname $HOST"
        exit 1
fi  fi
I have no name!@clickhouse-replicated-shard1-0:/$ echo $KEEPER_SERVER_ID
0
I have no name!@clickhouse-replicated-shard1-0:/$

The script is also present in the configmap and all, but it is apparently just not executed for some reason.

marcleibold commented 1 year ago

Another thing I checked: since the last line in the script is the following: exec /opt/bitnami/scripts/clickhouse/entrypoint.sh /opt/bitnami/scripts/clickhouse/run.sh -- --listen_host=0.0.0.0

There should be a process called setup.sh running, after the script is run. (Which is also the case when it is run manually) This process is not there when I run top, therefore the issue is almost definitely where the script is supposed to get executed

fmulero commented 1 year ago

Thanks a lot for all the clues! I did some changes and tests but it is taking me more than expected and I have also some issues with shards. I've just opened an internal task to address it. We will keep you posted on any news.

marcleibold commented 1 year ago

Alright, thanks for your effort on this and for keeping me posted!

Nello-Angelo commented 1 year ago

i have the same problem init script dont work

initdbScripts:
      create-extra-db.sql: |
         CREATE DATABASE [IF NOT EXISTS] test_datasets;
         GRANT ALL ON test_datasets.* TO clickhouse;
Jojoooo1 commented 1 year ago

Where you able to fix it ?

fmulero commented 1 year ago

Sorry, there is no updates on this 😞

exfly commented 1 year ago

Any workaround here?

marcleibold commented 1 year ago

Any workaround here?

Not as far as I know, just use the built-in Zookeeper

brendavarguez commented 11 months ago

Is there any update?

fmulero commented 11 months ago

Sorry, there is no updates on this. I'll try to bump the priority but we are a small team we can't give you any ETA, sorry.

mike-fischer1 commented 9 months ago

Hi this issue is affecting us since we can't switch over to clickhouse-keeper completely and zookeeper isn't officially support by clickhouse anymore.

nikitamikhaylov commented 9 months ago

zookeeper isn't officially support by clickhouse anymore.

This is not true. We still support ZooKeeper for the sake of backward compatibility and our users. However, ClickHouse Keeper proved to be much better and we've implemented several extensions which allow us to get better performance in certain scenarios.

mike-fischer1 commented 8 months ago

zookeeper isn't officially support by clickhouse anymore.

This is not true. We still support ZooKeeper for the sake of backward compatibility and our users. However, ClickHouse Keeper proved to be much better and we've implemented several extensions which allow us to get better performance in certain scenarios.

We have a support contract with clickhouse and they really want us to use clickhouse-keeper.

mike-fischer1 commented 7 months ago

Any updates?

simonfelding commented 7 months ago

would like this to be fixed.

fmulero commented 6 months ago

I've just bumped the priority

mleklund commented 6 months ago

I have been messing with the chart and I am pretty sure the issue is that a set of keeper replicas is created for every shard. Looking over the documentation for shards and for replicas, I believe that all nodes should share a single set of keepers. Now whether the right thing to do is to create a separate statefulset of keepers (which would probably be easiest) or to only point servers to the keepers on shard 0, I will leave up to the maintainers.

pankaj-taneja commented 5 months ago

Any release date decided for the fix of this issue?

exfly commented 5 months ago

Any updates?

EamonZhang commented 5 months ago

I have been messing with the chart and I am pretty sure the issue is that a set of keeper replicas is created for every shard. Looking over the documentation for shards and for replicas, I believe that all nodes should share a single set of keepers. Now whether the right thing to do is to create a separate statefulset of keepers (which would probably be easiest) or to only point servers to the keepers on shard 0, I will leave up to the maintainers.

point servers to the keepers on shard 0 . A temporary solution, easy to modify and works well .

values.yaml

       <node>
-        <host from_env="{{ printf "KEEPER_NODE_%d" $node }}"></host>
+        <host from_env="{{ printf "ZOOKEEPER_NODE_%d" $node }}"></host>
         <port>{{ $.Values.service.ports.keeper }}</port>
       </node>

statefulset.yaml

            {{- if $.Values.keeper.enabled }}
            {{- $replicas := $.Values.replicaCount | int }}
            {{- range $j, $r := until $replicas }}
             - name: {{ printf "KEEPER_NODE_%d" $j }}
               value: {{ printf "%s-shard%d-%d.%s.%s.svc.%s" (include "common.names.fullname" $ ) $i $j (include "clickhouse.headlessServiceName" $) (include "common.names.namespace" $) $.Values.clusterDomain }}
+            - name: {{ printf "ZOOKEEPER_NODE_%d" $j }}
+              value: {{ printf "%s-shard%d-%d.%s.%s.svc.%s" (include "common.names.fullname" $ ) 0 $j (include "clickhouse.headlessServiceName" $) (include "common.names.namespace" $) $.Values.clusterDomain }}
            {{- end }}