Open liubo-it opened 1 month ago
@Slach Can you help me? I'm following the documentation
please, stop share text as images, this is mental degradation
Which instruction did you follow exactly, share link?
please, stop share text as images, this is mental degradation
Which instruction did you follow exactly, share link?
sry,I refer to the following document to deploy clickhouse-keeper, I get an error when I start clickhouse-keeper-02 pod
error
4.08.03 05:14:40.671867 [ 22 ] {} <Debug> KeeperSnapshotManagerS3: Shutting down KeeperSnapshotManagerS3
2024.08.03 05:14:40.671899 [ 22 ] {} <Information> KeeperSnapshotManagerS3: KeeperSnapshotManagerS3 shut down
2024.08.03 05:14:40.671911 [ 22 ] {} <Debug> KeeperDispatcher: Dispatcher shut down
2024.08.03 05:14:40.672404 [ 22 ] {} <Error> Application: Code: 568. DB::Exception: At least one of servers should be able to start as leader (without <start_as_follower>). (RAFT_ERROR), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000000e42fdb in /usr/bin/clickhouse-keeper
1. DB::Exception::Exception<char const (&) [88]>(int, char const (&) [88]) @ 0x000000000086a740 in /usr/bin/clickhouse-keeper
2. DB::KeeperStateManager::parseServersConfiguration(Poco::Util::AbstractConfiguration const&, bool, bool) const @ 0x0000000000869595 in /usr/bin/clickhouse-keeper
3. DB::KeeperStateManager::KeeperStateManager(int, String const&, String const&, Poco::Util::AbstractConfiguration const&, std::shared_ptr<DB::CoordinationSettings> const&, std::shared_ptr<DB::KeeperContext>) @ 0x000000000086b08b in /usr/bin/clickhouse-keeper
4. DB::KeeperServer::KeeperServer(std::shared_ptr<DB::KeeperConfigurationAndSettings> const&, Poco::Util::AbstractConfiguration const&, ConcurrentBoundedQueue<DB::KeeperStorage::ResponseForSession>&, ConcurrentBoundedQueue<DB::CreateSnapshotTask>&, std::shared_ptr<DB::KeeperContext>, DB::KeeperSnapshotManagerS3&, std::function<void (unsigned long, DB::KeeperStorage::RequestForSession const&)>) @ 0x0000000000802bc1 in /usr/bin/clickhouse-keeper
5. DB::KeeperDispatcher::initialize(Poco::Util::AbstractConfiguration const&, bool, bool, std::shared_ptr<DB::Macros const> const&) @ 0x00000000007e81c6 in /usr/bin/clickhouse-keeper
6. DB::Context::initializeKeeperDispatcher(bool) const @ 0x0000000000a5bb06 in /usr/bin/clickhouse-keeper
7. DB::Keeper::main(std::vector<String, std::allocator<String>> const&) @ 0x0000000000b771e9 in /usr/bin/clickhouse-keeper
8. Poco::Util::Application::run() @ 0x0000000000ffbf26 in /usr/bin/clickhouse-keeper
9. DB::Keeper::run() @ 0x0000000000b73f7e in /usr/bin/clickhouse-keeper
10. Poco::Util::ServerApplication::run(int, char**) @ 0x0000000001012d39 in /usr/bin/clickhouse-keeper
11. mainEntryClickHouseKeeper(int, char**) @ 0x0000000000b72ef8 in /usr/bin/clickhouse-keeper
12. main @ 0x0000000000b81b1d in /usr/bin/clickhouse-keeper
(version 23.10.5.20 (official build))
2024.08.03 05:14:40.672441 [ 22 ] {} <Error> Application: DB::Exception: At least one of servers should be able to start as leader (without <start_as_follower>)
2024.08.03 05:14:40.672446 [ 22 ] {} <Information> Application: shutting down
2024.08.03 05:14:40.672449 [ 22 ] {} <Debug> Application: Uninitializing subsystem: Logging Subsystem
2024.08.03 05:14:40.672565 [ 23 ] {} <Trace> BaseDaemon: Received signal -2
2024.08.03 05:14:40.672601 [ 23 ] {} <Information> BaseDaemon: Stop SignalListener thread
describe
k8s resource file
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: wukong-clickhouse-keeper-local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: clickhouse-keeper-local-pv-0
namespace: wukong-application
labels:
name: clickhouse-keeper-local-pv-0
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: wukong-clickhouse-keeper-local-storage
hostPath:
path: /data/tingyun/wukong/tingyun/common/clickhouse-keeper/data0
type: DirectoryOrCreate
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 10.128.9.10
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: clickhouse-keeper-local-pv-1
namespace: wukong-application
labels:
name: clickhouse-keeper-local-pv-1
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: wukong-clickhouse-keeper-local-storage
hostPath:
path: /data/tingyun/wukong/tingyun/common/clickhouse-keeper/data1
type: DirectoryOrCreate
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 10.128.9.10
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: clickhouse-keeper-local-pv-2
namespace: wukong-application
labels:
name: clickhouse-keeper-local-pv-2
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: wukong-clickhouse-keeper-local-storage
hostPath:
path: /data/tingyun/wukong/tingyun/common/clickhouse-keeper/data2
type: DirectoryOrCreate
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 10.128.9.10
---
apiVersion: v1
kind: Service
metadata:
name: wukong-clickhouse-keeper-hs
namespace: wukong-application
labels:
app: wukong-clickhouse-keeper
spec:
ports:
- port: 9234
name: raft
clusterIP: None
selector:
app: wukong-clickhouse-keeper
---
apiVersion: v1
kind: Service
metadata:
name: wukong-clickhouse-keeper
namespace: wukong-application
labels:
app: wukong-clickhouse-keeper
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
prometheus.io/port: "9363"
prometheus.io/scrape: "true"
spec:
ports:
- port: 2181
name: client
- port: 9363
name: prometheus
selector:
app: wukong-clickhouse-keeper
---
apiVersion: v1
kind: ConfigMap
metadata:
name: wukong-clickhouse-keeper
namespace: wukong-application
labels:
app: wukong-clickhouse-keeper
data:
keeper_config.xml: |
<clickhouse>
<include_from>/tmp/clickhouse-keeper/config.d/generated-keeper-settings.xml</include_from>
<logger>
<level>trace</level>
<console>true</console>
</logger>
<listen_host>::</listen_host>
<keeper_server incl="keeper_server">
<enable_reconfiguration>true</enable_reconfiguration>
<path>/var/lib/clickhouse-keeper</path>
<tcp_port>2181</tcp_port>
<four_letter_word_white_list>*</four_letter_word_white_list>
<coordination_settings>
<!-- <raft_logs_level>trace</raft_logs_level> -->
<raft_logs_level>information</raft_logs_level>
</coordination_settings>
</keeper_server>
<prometheus>
<endpoint>/metrics</endpoint>
<port>9363</port>
<metrics>true</metrics>
<events>true</events>
<asynchronous_metrics>true</asynchronous_metrics>
<status_info>true</status_info>
</prometheus>
</clickhouse>
---
apiVersion: v1
kind: ConfigMap
metadata:
name: wukong-clickhouse-keeper-scripts
namespace: wukong-application
labels:
app: wukong-clickhouse-keeper-scripts
data:
env.sh: |
#!/usr/bin/env bash
export DOMAIN=`hostname -d`
export CLIENT_HOST=clickhouse-keeper
export CLIENT_PORT=2181
export RAFT_PORT=9234
keeperFunctions.sh: |
#!/usr/bin/env bash
set -ex
function keeperConfig() {
echo "$HOST.$DOMAIN:$RAFT_PORT;$ROLE;$WEIGHT"
}
function keeperConnectionString() {
# If the client service address is not yet available, then return localhost
set +e
getent hosts "${CLIENT_HOST}" 2>/dev/null 1>/dev/null
if [[ $? -ne 0 ]]; then
set -e
echo "-h localhost -p ${CLIENT_PORT}"
else
set -e
echo "-h ${CLIENT_HOST} -p ${CLIENT_PORT}"
fi
}
keeperStart.sh: |
#!/usr/bin/env bash
set -ex
source /conf/env.sh
source /conf/keeperFunctions.sh
HOST=`hostname -s`
if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then
NAME=${BASH_REMATCH[1]}
ORD=${BASH_REMATCH[2]}
else
echo Failed to parse name and ordinal of Pod
exit 1
fi
export MY_ID=$((ORD+1))
set +e
getent hosts $DOMAIN
if [[ $? -eq 0 ]]; then
ACTIVE_ENSEMBLE=true
else
ACTIVE_ENSEMBLE=false
fi
set -e
mkdir -p /tmp/clickhouse-keeper/config.d/
if [[ "true" == "${ACTIVE_ENSEMBLE}" ]]; then
# get current config from clickhouse-keeper
CURRENT_KEEPER_CONFIG=$(clickhouse-keeper-client --history-file=/dev/null -h ${CLIENT_HOST} -p ${CLIENT_PORT} -q "get /keeper/config" || true)
# generate dynamic config, add current server to xml
{
echo "<yandex><keeper_server>"
echo "<server_id>${MY_ID}</server_id>"
echo "<raft_configuration>"
if [[ "0" == $(echo "${CURRENT_KEEPER_CONFIG}" | grep -c "${HOST}.${DOMAIN}") ]]; then
echo "<server><id>${MY_ID}</id><hostname>${HOST}.${DOMAIN}</hostname><port>${RAFT_PORT}</port><priority>1</priority><start_as_follower>true</start_as_follower></server>"
fi
while IFS= read -r line; do
id=$(echo "$line" | cut -d '=' -f 1 | cut -d '.' -f 2)
if [[ "" != "${id}" ]]; then
hostname=$(echo "$line" | cut -d '=' -f 2 | cut -d ';' -f 1 | cut -d ':' -f 1)
port=$(echo "$line" | cut -d '=' -f 2 | cut -d ';' -f 1 | cut -d ':' -f 2)
priority=$(echo "$line" | cut -d ';' -f 3)
priority=${priority:-1}
port=${port:-$RAFT_PORT}
echo "<server><id>$id</id><hostname>$hostname</hostname><port>$port</port><priority>$priority</priority></server>"
fi
done <<< "$CURRENT_KEEPER_CONFIG"
echo "</raft_configuration>"
echo "</keeper_server></yandex>"
} > /tmp/clickhouse-keeper/config.d/generated-keeper-settings.xml
else
# generate dynamic config, add current server to xml
{
echo "<yandex><keeper_server>"
echo "<server_id>${MY_ID}</server_id>"
echo "<raft_configuration>"
echo "<server><id>${MY_ID}</id><hostname>${HOST}.${DOMAIN}</hostname><port>${RAFT_PORT}</port><priority>1</priority></server>"
echo "</raft_configuration>"
echo "</keeper_server></yandex>"
} > /tmp/clickhouse-keeper/config.d/generated-keeper-settings.xml
fi
# run clickhouse-keeper
cat /tmp/clickhouse-keeper/config.d/generated-keeper-settings.xml
rm -rfv /var/lib/clickhouse-keeper/terminated
clickhouse-keeper --config-file=/etc/clickhouse-keeper/keeper_config.xml
keeperTeardown.sh: |
#!/usr/bin/env bash
set -ex
exec > /proc/1/fd/1
exec 2> /proc/1/fd/2
source /conf/env.sh
source /conf/keeperFunctions.sh
set +e
KEEPER_URL=$(keeperConnectionString)
set -e
HOST=`hostname -s`
if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then
NAME=${BASH_REMATCH[1]}
ORD=${BASH_REMATCH[2]}
else
echo Failed to parse name and ordinal of Pod
exit 1
fi
export MY_ID=$((ORD+1))
CURRENT_KEEPER_CONFIG=$(clickhouse-keeper-client --history-file=/dev/null -h localhost -p ${CLIENT_PORT} -q "get /keeper/config")
CLUSTER_SIZE=$(echo -e "${CURRENT_KEEPER_CONFIG}" | grep -c -E '^server\.[0-9]+=')
echo "CLUSTER_SIZE=$CLUSTER_SIZE, MyId=$MY_ID"
# If CLUSTER_SIZE > 1, this server is being permanently removed from raft_configuration.
if [[ "$CLUSTER_SIZE" -gt "1" ]]; then
clickhouse-keeper-client --history-file=/dev/null -q "reconfig remove $MY_ID" ${KEEPER_URL}
fi
# Wait to remove $MY_ID from quorum
# for (( i = 0; i < 6; i++ )); do
# CURRENT_KEEPER_CONFIG=$(clickhouse-keeper-client --history-file=/dev/null -h localhost -p ${CLIENT_PORT} -q "get /keeper/config")
# if [[ "0" == $(echo -e "${CURRENT_KEEPER_CONFIG}" | grep -c -E "^server.${MY_ID}=$HOST.+participant;[0-1]$") ]]; then
# echo "$MY_ID removed from quorum"
# break
# else
# echo "$MY_ID still present in quorum"
# fi
# sleep 1
# done
# Wait for client connections to drain. Kubernetes will wait until the configured
# "terminationGracePeriodSeconds" before forcibly killing the container
for (( i = 0; i < 3; i++ )); do
CONN_COUNT=`echo $(exec 3<>/dev/tcp/127.0.0.1/2181 ; printf "cons" >&3 ; IFS=; tee <&3; exec 3<&- ;) | grep -v "^$" | grep -v "127.0.0.1" | wc -l`
if [[ "$CONN_COUNT" -gt "0" ]]; then
echo "$CONN_COUNT non-local connections still connected."
sleep 1
else
echo "$CONN_COUNT non-local connections"
break
fi
done
touch /var/lib/clickhouse-keeper/terminated
# Kill the primary process ourselves to circumvent the terminationGracePeriodSeconds
ps -ef | grep clickhouse-keeper | grep -v grep | awk '{print $1}' | xargs kill
keeperLive.sh: |
#!/usr/bin/env bash
set -ex
source /conf/env.sh
OK=$(exec 3<>/dev/tcp/127.0.0.1/${CLIENT_PORT} ; printf "ruok" >&3 ; IFS=; tee <&3; exec 3<&- ;)
# Check to see if keeper service answers
if [[ "$OK" == "imok" ]]; then
exit 0
else
exit 1
fi
keeperReady.sh: |
#!/usr/bin/env bash
set -ex
exec > /proc/1/fd/1
exec 2> /proc/1/fd/2
source /conf/env.sh
source /conf/keeperFunctions.sh
HOST=`hostname -s`
# Check to see if clickhouse-keeper service answers
set +e
getent hosts $DOMAIN
if [[ $? -ne 0 ]]; then
echo "no active DNS records in service, first running pod"
exit 0
elif [[ -f /var/lib/clickhouse-keeper/terminated ]]; then
echo "termination in progress"
exit 0
else
set -e
# An ensemble exists, check to see if this node is already a member.
# Extract resource name and this members' ordinal value from pod hostname
if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then
NAME=${BASH_REMATCH[1]}
ORD=${BASH_REMATCH[2]}
else
echo "Failed to parse name and ordinal of Pod"
exit 1
fi
MY_ID=$((ORD+1))
CURRENT_KEEPER_CONFIG=$(clickhouse-keeper-client --history-file=/dev/null -h ${CLIENT_HOST} -p ${CLIENT_PORT} -q "get /keeper/config" || exit 0)
# Check to see if clickhouse-keeper for this node is a participant in raft cluster
if [[ "1" == $(echo -e "${CURRENT_KEEPER_CONFIG}" | grep -c -E "^server.${MY_ID}=${HOST}.+participant;1$") ]]; then
echo "clickhouse-keeper instance is available and an active participant"
exit 0
else
echo "clickhouse-keeper instance is ready to add as participant with 1 weight."
ROLE=participant
WEIGHT=1
KEEPER_URL=$(keeperConnectionString)
NEW_KEEPER_CONFIG=$(keeperConfig)
clickhouse-keeper-client --history-file=/dev/null -q "reconfig add 'server.$MY_ID=$NEW_KEEPER_CONFIG'" ${KEEPER_URL}
exit 0
fi
fi
---
# Setup ClickHouse Keeper StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
# nodes would be named as clickhouse-keeper-0, clickhouse-keeper-1, clickhouse-keeper-2
name: wukong-clickhouse-keeper
namespace: wukong-application
labels:
app: wukong-clickhouse-keeper
spec:
selector:
matchLabels:
app: wukong-clickhouse-keeper
serviceName: wukong-clickhouse-keeper-hs
replicas: 3
template:
metadata:
labels:
app: wukong-clickhouse-keeper
annotations:
prometheus.io/port: '9363'
prometheus.io/scrape: 'true'
spec:
volumes:
- name: wukong-clickhouse-keeper-settings
configMap:
name: wukong-clickhouse-keeper
items:
- key: keeper_config.xml
path: keeper_config.xml
- name: wukong-clickhouse-keeper-scripts
configMap:
name: wukong-clickhouse-keeper-scripts
defaultMode: 0755
containers:
- name: wukong-clickhouse-keeper
imagePullPolicy: IfNotPresent
image: "ccr.ccs.tencentyun.com/wukong-common/clickhouse-keeper:23.10.5.20"
resources:
requests:
memory: "256M"
cpu: "100m"
limits:
memory: "4Gi"
cpu: "1000m"
volumeMounts:
- name: wukong-clickhouse-keeper-settings
mountPath: /etc/clickhouse-keeper/
- name: wukong-clickhouse-keeper-scripts
mountPath: /conf/
- name: data
mountPath: /var/lib/clickhouse-keeper
command:
- /conf/keeperStart.sh
lifecycle:
preStop:
exec:
command:
- /conf/keeperTeardown.sh
livenessProbe:
exec:
command:
- /conf/keeperLive.sh
initialDelaySeconds: 60
timeoutSeconds: 10
readinessProbe:
exec:
command:
- /conf/keeperReady.sh
initialDelaySeconds: 60
timeoutSeconds: 10
ports:
- containerPort: 2181
name: client
protocol: TCP
- containerPort: 9234
name: quorum
protocol: TCP
- containerPort: 9363
name: metrics
protocol: TCP
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: wukong-clickhouse-keeper-local-storage
resources:
requests:
storage: 50Gi
This problem also exists when I use heml chart
command helm install clickhouse-keeper --generate-name
link https://artifacthub.io/packages/helm/duyet/clickhouse-keeper?modal=install
screenshot
this is not officiall helm chart
did you run
kubectl apply -n <namespace> -f https://github.com/Altinity/clickhouse-operator/blob/master/deploy/clickhouse-keeper/clickhouse-keeper-manually/clickhouse-keeper-3-nodes.yaml
only once?? or do something else?
Application: Code: 568. DB::Exception: At least one of servers should be able to start as leader (without <start_as_follower>)
try to execute on live pods
clickhouse-keeper client -q "get /keeper/config"
grep -C 10 start_as_follower -r /etc/clickhouse-keeper/
@liubo-it could you check
kubectl apply -n <namespace> -f https://github.com/Altinity/clickhouse-operator/blob/0.24.0/deploy/clickhouse-keeper/clickhouse-keeper-manually/clickhouse-keeper-3-nodes.yaml
?
@liubo-it any news from your side?
any news from your side?
That's still the case, so I went the other way