Open marcleibold opened 1 year ago
Hi,
Does the issue happen when using the zookeeper included in the chart? Just to pin-point where the issue could be
Hi,
I have it configured like this now
keeper:
enabled: false
zookeeper:
enabled: true
replicaCount: 3
And now the command just completes normally
CREATE TABLE logs_replicated ON CLUSTER default
(
`gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192
Query id: 0c7dd092-a396-4fe6-9ca9-0001a867c370
┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard0-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │ 0 │ │ 3 │ 0 │
│ clickhouse-replicated-shard0-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │ 0 │ │ 2 │ 0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │ 0 │ │ 1 │ 0 │
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │ 0 │ │ 0 │ 0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
4 rows in set. Elapsed: 0.321 sec.
Thanks @marcleibold for letting us know. Have you faced the issue with the default defaultConfigurationOverrides
? Have you changed that value when moved on to zookeeper
?
Hi @fmulero ,
I did not change anything when I tried it out with zookeeper, so the defaultConfigurationOverrides
were still the same as described above.
And when I now try to remove the defaultConfigurationOverrides
from the values.yaml
completely and try the CREATE TABLE
command again, it is again stuck on 49%
CREATE TABLE logs_replicated ON CLUSTER default
(
`gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192
Query id: 3cbb9139-279a-4853-9038-a2208a08444a
┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │ 0 │ │ 3 │ 0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │ 0 │ │ 2 │ 0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
↖ Progress: 2.00 rows, 262.00 B (0.21 rows/s., 27.98 B/s.) 49%
Hi @marcleibold
I've reproduced the same issue in a simpler scenario, just enabling keeper:
helm install myrelease bitnami/clickhouse --set keeper.enabled=true --set zookeeper.enabled=false
I've checked keeper
status and it seems there is no active clients (10.42.1.26
is the ip of my pod).
$ echo stat | nc localhost 2181
ClickHouse Keeper version: v23.3.1.2823-testing-46e85357ce2da2a99f56ee83a079e892d7ec3726
Clients:
10.42.1.26:45740(recved=0,sent=0)
10.42.1.26:49358(recved=5005,sent=5006)
Latency min/avg/max: 0/0/6
Received: 5005
Sent: 5006
Connections: 1
Outstanding: 0
Zxid: 961
Mode: follower
Node count: 80
It seems something is misconfigured about keeper. I need a further investigation, please bear with us.
I think the issue may be here, I don't think KEEPER_SERVER_ID
is actually set anywhere
It seems like that is the issue. I also do not see the KEEPER_SERVER_ID
when I run set
in one of the containers
I have no name!@clickhouse-replicated-shard1-0:/$ set
APP_VERSION=23.3.1
BASH=/bin/bash
BASHOPTS=checkwinsize:cmdhist:complete_fullquote:expand_aliases:extquote:force_fignore:globasciiranges:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=([0]="0")
BASH_ARGV=()
BASH_CMDS=()
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="5" [1]="1" [2]="4" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
BASH_VERSION='5.1.4(1)-release'
BITNAMI_APP_NAME=clickhouse
BITNAMI_DEBUG=false
CLICKHOUSE_ADMIN_PASSWORD=<redacted>
CLICKHOUSE_ADMIN_USER=<redacted>
CLICKHOUSE_HTTPS_PORT=8443
CLICKHOUSE_HTTP_PORT=8123
CLICKHOUSE_INTERSERVER_HTTP_PORT=9009
CLICKHOUSE_KEEPER_INTER_PORT=9444
CLICKHOUSE_KEEPER_PORT=2181
CLICKHOUSE_KEEPER_SECURE_PORT=3181
CLICKHOUSE_METRICS_PORT=8001
CLICKHOUSE_MYSQL_PORT=9004
CLICKHOUSE_POSTGRESQL_PORT=9005
CLICKHOUSE_REPLICATED_PORT=tcp://10.0.46.111:8123
CLICKHOUSE_REPLICATED_PORT_2181_TCP=tcp://10.0.46.111:2181
CLICKHOUSE_REPLICATED_PORT_2181_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_2181_TCP_PORT=2181
CLICKHOUSE_REPLICATED_PORT_2181_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_3181_TCP=tcp://10.0.46.111:3181
CLICKHOUSE_REPLICATED_PORT_3181_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_3181_TCP_PORT=3181
CLICKHOUSE_REPLICATED_PORT_3181_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_443_TCP=tcp://10.0.46.111:443
CLICKHOUSE_REPLICATED_PORT_443_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_443_TCP_PORT=443
CLICKHOUSE_REPLICATED_PORT_443_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_8001_TCP=tcp://10.0.46.111:8001
CLICKHOUSE_REPLICATED_PORT_8001_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_8001_TCP_PORT=8001
CLICKHOUSE_REPLICATED_PORT_8001_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_8123_TCP=tcp://10.0.46.111:8123
CLICKHOUSE_REPLICATED_PORT_8123_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_8123_TCP_PORT=8123
CLICKHOUSE_REPLICATED_PORT_8123_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9000_TCP=tcp://10.0.46.111:9000
CLICKHOUSE_REPLICATED_PORT_9000_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9000_TCP_PORT=9000
CLICKHOUSE_REPLICATED_PORT_9000_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9004_TCP=tcp://10.0.46.111:9004
CLICKHOUSE_REPLICATED_PORT_9004_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9004_TCP_PORT=9004
CLICKHOUSE_REPLICATED_PORT_9004_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9005_TCP=tcp://10.0.46.111:9005
CLICKHOUSE_REPLICATED_PORT_9005_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9005_TCP_PORT=9005
CLICKHOUSE_REPLICATED_PORT_9005_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9009_TCP=tcp://10.0.46.111:9009
CLICKHOUSE_REPLICATED_PORT_9009_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9009_TCP_PORT=9009
CLICKHOUSE_REPLICATED_PORT_9009_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9440_TCP=tcp://10.0.46.111:9440
CLICKHOUSE_REPLICATED_PORT_9440_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9440_TCP_PORT=9440
CLICKHOUSE_REPLICATED_PORT_9440_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9444_TCP=tcp://10.0.46.111:9444
CLICKHOUSE_REPLICATED_PORT_9444_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9444_TCP_PORT=9444
CLICKHOUSE_REPLICATED_PORT_9444_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_SERVICE_HOST=10.0.46.111
CLICKHOUSE_REPLICATED_SERVICE_PORT=8123
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTP=8123
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTPS=443
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTP_INTERSRV=9009
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTP_METRICS=8001
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP=9000
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_KEEPER=2181
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_KEEPERINTER=9444
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_KEEPERTLS=3181
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_MYSQL=9004
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_POSTGRESQL=9005
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_SECURE=9440
CLICKHOUSE_REPLICA_ID=clickhouse-replicated-shard1-0
CLICKHOUSE_SHARD_ID=shard1
CLICKHOUSE_TCP_PORT=9000
CLICKHOUSE_TCP_SECURE_PORT=9440
CLICKHOUSE_TLS_CA_FILE=/opt/bitnami/clickhouse/certs/ca.crt
CLICKHOUSE_TLS_CERT_FILE=/opt/bitnami/clickhouse/certs/tls.crt
CLICKHOUSE_TLS_KEY_FILE=/opt/bitnami/clickhouse/certs/tls.key
COLUMNS=155
DIRSTACK=()
EUID=1001
GROUPS=()
HISTFILE=//.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/
HOSTNAME=clickhouse-replicated-shard1-0
HOSTTYPE=x86_64
IFS=$' \t\n'
KEEPER_NODE_0=clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local
KEEPER_NODE_1=clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local
KUBERNETES_PORT=tcp://10.0.32.1:443
KUBERNETES_PORT_443_TCP=tcp://10.0.32.1:443
KUBERNETES_PORT_443_TCP_ADDR=10.0.32.1
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_HOST=10.0.32.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
LINES=17
MACHTYPE=x86_64-pc-linux-gnu
MAILCHECK=60
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
OS_ARCH=amd64
OS_FLAVOUR=debian-11
OS_NAME=linux
PATH=/opt/bitnami/common/bin:/opt/bitnami/clickhouse/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PIPESTATUS=([0]="1")
PPID=0
PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
PS2='> '
PS4='+ '
PWD=/
SHELL=/bin/sh
SHELLOPTS=braceexpand:emacs:hashall:histexpand:history:interactive-comments:monitor
SHLVL=1
TERM=xterm
UID=1001
_=']'
clickhouseCTL_API=3
Although the variable should be set in this script.
The line also works completely fine as I just tested inside of my container:
I have no name!@clickhouse-replicated-shard1-0:/$ echo $KEEPER_SERVER_ID
I have no name!@clickhouse-replicated-shard1-0:/$ if [[ -f "/bitnami/clickhouse/keeper/data/myid" ]]; then
export KEEPER_SERVER_ID="$(cat /bitnami/clickhouse/keeper/data/myid)"
else
HOSTNAME="$(hostname -s)"
if [[ $HOSTNAME =~ (.*)-([0-9]+)$ ]]; then
export KEEPER_SERVER_ID=${BASH_REMATCH[2]}
else
echo "Failed to get index from hostname $HOST"
exit 1
fi fi
I have no name!@clickhouse-replicated-shard1-0:/$ echo $KEEPER_SERVER_ID
0
I have no name!@clickhouse-replicated-shard1-0:/$
The script is also present in the configmap and all, but it is apparently just not executed for some reason.
Another thing I checked:
since the last line in the script is the following:
exec /opt/bitnami/scripts/clickhouse/entrypoint.sh /opt/bitnami/scripts/clickhouse/run.sh -- --listen_host=0.0.0.0
There should be a process called setup.sh
running, after the script is run. (Which is also the case when it is run manually)
This process is not there when I run top
, therefore the issue is almost definitely where the script is supposed to get executed
Thanks a lot for all the clues! I did some changes and tests but it is taking me more than expected and I have also some issues with shards. I've just opened an internal task to address it. We will keep you posted on any news.
Alright, thanks for your effort on this and for keeping me posted!
i have the same problem init script dont work
initdbScripts:
create-extra-db.sql: |
CREATE DATABASE [IF NOT EXISTS] test_datasets;
GRANT ALL ON test_datasets.* TO clickhouse;
Where you able to fix it ?
Sorry, there is no updates on this 😞
Any workaround here?
Any workaround here?
Not as far as I know, just use the built-in Zookeeper
Is there any update?
Sorry, there is no updates on this. I'll try to bump the priority but we are a small team we can't give you any ETA, sorry.
Hi this issue is affecting us since we can't switch over to clickhouse-keeper completely and zookeeper isn't officially support by clickhouse anymore.
zookeeper isn't officially support by clickhouse anymore.
This is not true. We still support ZooKeeper for the sake of backward compatibility and our users. However, ClickHouse Keeper proved to be much better and we've implemented several extensions which allow us to get better performance in certain scenarios.
zookeeper isn't officially support by clickhouse anymore.
This is not true. We still support ZooKeeper for the sake of backward compatibility and our users. However, ClickHouse Keeper proved to be much better and we've implemented several extensions which allow us to get better performance in certain scenarios.
We have a support contract with clickhouse and they really want us to use clickhouse-keeper.
Any updates?
would like this to be fixed.
I've just bumped the priority
I have been messing with the chart and I am pretty sure the issue is that a set of keeper replicas is created for every shard. Looking over the documentation for shards and for replicas, I believe that all nodes should share a single set of keepers. Now whether the right thing to do is to create a separate statefulset of keepers (which would probably be easiest) or to only point servers to the keepers on shard 0, I will leave up to the maintainers.
Any release date decided for the fix of this issue?
Any updates?
I have been messing with the chart and I am pretty sure the issue is that a set of keeper replicas is created for every shard. Looking over the documentation for shards and for replicas, I believe that all nodes should share a single set of keepers. Now whether the right thing to do is to create a separate statefulset of keepers (which would probably be easiest) or to only point servers to the keepers on shard 0, I will leave up to the maintainers.
point servers to the keepers on shard 0
. A temporary solution, easy to modify and works well .
values.yaml
<node>
- <host from_env="{{ printf "KEEPER_NODE_%d" $node }}"></host>
+ <host from_env="{{ printf "ZOOKEEPER_NODE_%d" $node }}"></host>
<port>{{ $.Values.service.ports.keeper }}</port>
</node>
statefulset.yaml
{{- if $.Values.keeper.enabled }}
{{- $replicas := $.Values.replicaCount | int }}
{{- range $j, $r := until $replicas }}
- name: {{ printf "KEEPER_NODE_%d" $j }}
value: {{ printf "%s-shard%d-%d.%s.%s.svc.%s" (include "common.names.fullname" $ ) $i $j (include "clickhouse.headlessServiceName" $) (include "common.names.namespace" $) $.Values.clusterDomain }}
+ - name: {{ printf "ZOOKEEPER_NODE_%d" $j }}
+ value: {{ printf "%s-shard%d-%d.%s.%s.svc.%s" (include "common.names.fullname" $ ) 0 $j (include "clickhouse.headlessServiceName" $) (include "common.names.namespace" $) $.Values.clusterDomain }}
{{- end }}
Name and Version
bitnami/clickhouse 3.1.5
What architecture are you using?
amd64
What steps will reproduce the bug?
Result: Pods are running without any suspicious logs, but when you either exec into them or execute some command from the web UI, which is executed "ON CLUSTER", the progress indicator never goes past 49%. This was tried with a
CREATE TABLE
statement trying to create aReplicatedMergeTree
on the cluster. The Clickhouse cluster consists of 2 shards and 2 replicas.Are you using any custom parameters or values?
Our
values.yaml
:What is the expected behavior?
The expected behaviour is normal creation of the tables within the
distributed_ddl_task_timeout
What do you see instead?
The table creation (tested with the
clickhouse-client
command after exec-ing into the pod) is stuck at 49% progress.When aborted, the table seems to have been created.
When trying to drop the tables, the same problem occurs:
The tables seem to have been created, but the command doesn't finish, therefore I believe Clickhouse-Keeper doesn't answer the command, but executes it.
When trying to create the table again, because I assumed Keeper executed the last command, the command tells me, that the replica already exists, not the table itself. So the problem seems to lay somewhere with the replicas
Additional information
No response