apecloud / kubeblocks-addons

KubeBlocks add-ons.
Apache License 2.0
34 stars 36 forks source link

[Feature]PolarDB-X member reconfiguration support #4

Open ahjing99 opened 1 year ago

ahjing99 commented 1 year ago

➜ ~ kbcli version Kubernetes: v1.27.3-gke.100 KubeBlocks: 0.7.0-beta.18 kbcli: 0.7.0-beta.18

  1. Create PolarDB-X

    
    
      `helm repo add kubeblocks-kbcli  https://jihulab.com/api/v4/projects/150246/packages/helm/stable`

"kubeblocks-kbcli" already exists with the same configuration, skipping

  `helm repo update kubeblocks-kbcli `

Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "kubeblocks-kbcli" chart repository Update Complete. ⎈Happy Helming!⎈

  `helm upgrade --install polardbx kubeblocks-kbcli/polardbx --version 0.7.0-beta.18 `

Release "polardbx" has been upgraded. Happy Helming! NAME: polardbx LAST DEPLOYED: Fri Nov 3 11:57:54 2023 NAMESPACE: default STATUS: deployed REVISION: 4 TEST SUITE: None NOTES: Thanks for installing PolarDB-X using KubeBlocks!

`kbcli cluster create  polardbx-tjxuol             --termination-policy=Halt             --monitoring-interval=0 --enable-all-logs=false --cluster-definition=polardbx --cluster-version=polardbx-v1.4.1 --set cpu=500m,memory=1Gi,replicas=3,storage=5Gi  --namespace default `

Cluster polardbx-tjxuol created

➜ ~ kbcli cluster describe polardbx-tjxuol Name: polardbx-tjxuol Created Time: Nov 03,2023 11:58 UTC+0800 NAMESPACE CLUSTER-DEFINITION VERSION STATUS TERMINATION-POLICY default polardbx polardbx-v1.4.1 Running WipeOut

Endpoints: COMPONENT MODE INTERNAL EXTERNAL gms ReadWrite polardbx-tjxuol-gms.default.svc.cluster.local:3306 polardbx-tjxuol-gms.default.svc.cluster.local:9104 dn ReadWrite polardbx-tjxuol-dn.default.svc.cluster.local:3306 cn ReadWrite polardbx-tjxuol-cn.default.svc.cluster.local:3306 polardbx-tjxuol-cn.default.svc.cluster.local:9104 cdc ReadWrite polardbx-tjxuol-cdc.default.svc.cluster.local:3306 polardbx-tjxuol-cdc.default.svc.cluster.local:9104

Topology: COMPONENT INSTANCE ROLE STATUS AZ NODE CREATED-TIME cdc polardbx-tjxuol-cdc-0 Running us-central1-c gke-yijing-default-pool-3e14ea35-klwc/10.128.0.26 Nov 03,2023 11:58 UTC+0800 cn polardbx-tjxuol-cn-0 Running us-central1-c gke-yijing-default-pool-3e14ea35-klwc/10.128.0.26 Nov 03,2023 11:58 UTC+0800 dn polardbx-tjxuol-dn-0 follower Running us-central1-c gke-yijing-default-pool-3e14ea35-hqtr/10.128.0.30 Nov 03,2023 11:58 UTC+0800 dn polardbx-tjxuol-dn-1 leader Running us-central1-c gke-yijing-default-pool-3e14ea35-hxpl/10.128.0.28 Nov 03,2023 11:58 UTC+0800 dn polardbx-tjxuol-dn-2 follower Running us-central1-c gke-yijing-default-pool-3e14ea35-klwc/10.128.0.26 Nov 03,2023 11:58 UTC+0800 gms polardbx-tjxuol-gms-0 leader Running us-central1-c gke-yijing-default-pool-3e14ea35-wg54/10.128.0.35 Nov 03,2023 11:58 UTC+0800 gms polardbx-tjxuol-gms-1 follower Running us-central1-c gke-yijing-default-pool-3e14ea35-wg54/10.128.0.35 Nov 03,2023 11:58 UTC+0800 gms polardbx-tjxuol-gms-2 follower Running us-central1-c gke-yijing-default-pool-3e14ea35-klwc/10.128.0.26 Nov 03,2023 11:58 UTC+0800

Resources Allocation: COMPONENT DEDICATED CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE-SIZE STORAGE-CLASS gms false 500m / 500m 1Gi / 1Gi data:5Gi kb-default-sc dn false 1 / 1 1Gi / 1Gi data:20Gi kb-default-sc cn false 1 / 1 1Gi / 1Gi data:20Gi kb-default-sc cdc false 1 / 1 1Gi / 1Gi data:20Gi kb-default-sc

Images: COMPONENT TYPE IMAGE gms gms polardbx/polardbx-engine-2.0:latest dn dn polardbx/polardbx-engine-2.0:latest cn cn polardbx/polardbx-sql:latest cdc cdc polardbx/polardbx-cdc:latest

Show cluster events: kbcli cluster list-events -n default polardbx-tjxuol

2. Restart

➜ ~ kbcli cluster restart polardbx-tjxuol Please type the name again(separate with white space when more than one): polardbx-tjxuol OpsRequest polardbx-tjxuol-restart-tqb2c created successfully, you can view the progress: kbcli cluster describe-ops polardbx-tjxuol-restart-tqb2c -n default

➜ ~ kbcli cluster describe-ops polardbx-tjxuol-restart-tqb2c -n default Spec: Name: polardbx-tjxuol-restart-tqb2c NameSpace: default Cluster: polardbx-tjxuol Type: Restart

Command: kbcli cluster restart polardbx-tjxuol --components=gms,dn,cn,cdc --namespace=default

Status: Start Time: Nov 03,2023 12:10 UTC+0800 Duration: 28m Status: Running Progress: 2/8 OBJECT-KEY STATUS DURATION MESSAGE Pod/polardbx-tjxuol-cdc-0 Succeed 3m21s Successfully restart: Pod/polardbx-tjxuol-cdc-0 in Component: cdc Pod/polardbx-tjxuol-cn-0 Succeed 3m4s Successfully restart: Pod/polardbx-tjxuol-cn-0 in Component: cn Pod/polardbx-tjxuol-dn-1 Pending Pod/polardbx-tjxuol-dn-2 Pending Pod/polardbx-tjxuol-dn-0 Processing 28m Start to restart: Pod/polardbx-tjxuol-dn-0 in Component: dn Pod/polardbx-tjxuol-gms-0 Pending Pod/polardbx-tjxuol-gms-2 Pending Pod/polardbx-tjxuol-gms-1 Processing 28m Start to restart: Pod/polardbx-tjxuol-gms-1 in Component: gms

Conditions: LAST-TRANSITION-TIME TYPE REASON STATUS MESSAGE Nov 03,2023 12:10 UTC+0800 Progressing OpsRequestProgressingStarted True Start to process the OpsRequest: polardbx-tjxuol-restart-tqb2c in Cluster: polardbx-tjxuol Nov 03,2023 12:10 UTC+0800 Validated ValidateOpsRequestPassed True OpsRequest: polardbx-tjxuol-restart-tqb2c is validated Nov 03,2023 12:10 UTC+0800 Restarting RestartStarted True Start to restart database in Cluster: polardbx-tjxuol

Warning Events:

➜ ~ k describe sts polardbx-tjxuol-dn Name: polardbx-tjxuol-dn Namespace: default CreationTimestamp: Fri, 03 Nov 2023 11:58:23 +0800 Selector: app.kubernetes.io/instance=polardbx-tjxuol,app.kubernetes.io/managed-by=kubeblocks,app.kubernetes.io/name=polardbx,apps.kubeblocks.io/component-name=dn Labels: app.kubernetes.io/component=dn app.kubernetes.io/instance=polardbx-tjxuol app.kubernetes.io/managed-by=kubeblocks app.kubernetes.io/name=polardbx apps.kubeblocks.io/component-name=dn rsm.workloads.kubeblocks.io/controller-generation=2 Annotations: config.kubeblocks.io/tpl-polardbx-scripts: polardbx-tjxuol-dn-polardbx-scripts kubeblocks.io/generation: 1 Replicas: 3 desired 3 total Update Strategy: OnDelete Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app.kubernetes.io/component=dn app.kubernetes.io/instance=polardbx-tjxuol app.kubernetes.io/managed-by=kubeblocks app.kubernetes.io/name=polardbx app.kubernetes.io/version=polardbx-v1.4.1 apps.kubeblocks.io/component-name=dn apps.kubeblocks.io/workload-type=Consensus Annotations: kubeblocks.io/restart: 2023-11-03T04:10:57Z Service Account: kb-polardbx-tjxuol Init Containers: tools-updater: Image: polardbx/xstore-tools:latest Port: Host Port: Command: /bin/ash Args: -c ./hack/update.sh /target Limits: cpu: 0 memory: 0 Environment Variables from: polardbx-tjxuol-dn-env ConfigMap Optional: false Environment: KB_POD_NAME: (v1:metadata.name) KB_POD_UID: (v1:metadata.uid) KB_NAMESPACE: (v1:metadata.namespace) KB_SA_NAME: (v1:spec.serviceAccountName) KB_NODENAME: (v1:spec.nodeName) KB_HOST_IP: (v1:status.hostIP) KB_POD_IP: (v1:status.podIP) KB_POD_IPS: (v1:status.podIPs) KB_HOSTIP: (v1:status.hostIP) KB_PODIP: (v1:status.podIP) KB_PODIPS: (v1:status.podIPs) KB_CLUSTER_NAME: polardbx-tjxuol KB_COMP_NAME: dn KB_CLUSTER_COMP_NAME: polardbx-tjxuol-dn KB_CLUSTER_UID_POSTFIX_8: 690c6c10 KB_POD_FQDN: $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc NODE_NAME: (v1:spec.nodeName) Mounts: /target from xstore-tools (rw) role-agent-installer: Image: msoap/shell2http:1.16.0 Port: Host Port: Command: cp /app/shell2http /role-probe/agent Environment: Mounts: /role-probe from role-agent (rw) Containers: engine: Image: polardbx/polardbx-engine-2.0:latest Ports: 3306/TCP, 11306/TCP, 31600/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Command: /scripts/xstore-setup.sh Limits: cpu: 1 memory: 1Gi Requests: cpu: 1 memory: 1Gi Startup: tcp-socket :mysql delay=20s timeout=30s period=10s #success=1 #failure=60 Environment Variables from: polardbx-tjxuol-dn-env ConfigMap Optional: false polardbx-tjxuol-dn-rsm-env ConfigMap Optional: false Environment: KB_POD_NAME: (v1:metadata.name) KB_POD_UID: (v1:metadata.uid) KB_NAMESPACE: (v1:metadata.namespace) KB_SA_NAME: (v1:spec.serviceAccountName) KB_NODENAME: (v1:spec.nodeName) KB_HOST_IP: (v1:status.hostIP) KB_POD_IP: (v1:status.podIP) KB_POD_IPS: (v1:status.podIPs) KB_HOSTIP: (v1:status.hostIP) KB_PODIP: (v1:status.podIP) KB_PODIPS: (v1:status.podIPs) KB_CLUSTER_NAME: polardbx-tjxuol KB_COMP_NAME: dn KB_CLUSTER_COMP_NAME: polardbx-tjxuol-dn KB_CLUSTER_UID_POSTFIX_8: 690c6c10 KB_POD_FQDN: $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc LANG: en_US.utf8 LC_ALL: en_US.utf8 ENGINE: galaxy ENGINE_HOME: /opt/galaxy_engine NODE_ROLE: candidate NODE_IP: (v1:status.hostIP) NODE_NAME: (v1:spec.nodeName) POD_IP: (v1:status.podIP) POD_NAME: (v1:metadata.name) LIMITS_CPU: 1000 (limits.cpu) LIMITS_MEM: 1073741824 (limits.memory) PORT_MYSQL: 3306 PORT_PAXOS: 11306 PORT_POLARX: 31600 KB_SERVICE_USER: polardbx_root KB_SERVICE_PASSWORD: <set to the key 'password' in secret 'polardbx-tjxuol-conn-credential'> Optional: false RSM_COMPATIBILITY_MODE: true Mounts: /data-log/mysql from data-log (rw) /data/mysql from data (rw) /etc/podinfo from podinfo (rw) /scripts/xstore-post-start.sh from scripts (rw,path="xstore-post-start.sh") /scripts/xstore-setup.sh from scripts (rw,path="xstore-setup.sh") /tools/xstore from xstore-tools (rw) exporter: Image: prom/mysqld-exporter:v0.14.0 Port: 9104/TCP Host Port: 0/TCP Limits: cpu: 0 memory: 0 Environment Variables from: polardbx-tjxuol-dn-env ConfigMap Optional: false polardbx-tjxuol-dn-rsm-env ConfigMap Optional: false Environment: KB_POD_NAME: (v1:metadata.name) KB_POD_UID: (v1:metadata.uid) KB_NAMESPACE: (v1:metadata.namespace) KB_SA_NAME: (v1:spec.serviceAccountName) KB_NODENAME: (v1:spec.nodeName) KB_HOST_IP: (v1:status.hostIP) KB_POD_IP: (v1:status.podIP) KB_POD_IPS: (v1:status.podIPs) KB_HOSTIP: (v1:status.hostIP) KB_PODIP: (v1:status.podIP) KB_PODIPS: (v1:status.podIPs) KB_CLUSTER_NAME: polardbx-tjxuol KB_COMP_NAME: dn KB_CLUSTER_COMP_NAME: polardbx-tjxuol-dn KB_CLUSTER_UID_POSTFIX_8: 690c6c10 KB_POD_FQDN: $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc MYSQL_MONITOR_USER: <set to the key 'username' in secret 'polardbx-tjxuol-conn-credential'> Optional: false MYSQL_MONITOR_PASSWORD: <set to the key 'password' in secret 'polardbx-tjxuol-conn-credential'> Optional: false DATA_SOURCE_NAME: $(MYSQL_MONITOR_USER):$(MYSQL_MONITOR_PASSWORD)@(localhost:3306)/ Mounts: kb-role-probe: Image: registry.cn-hangzhou.aliyuncs.com/apecloud/kubeblocks-tools:0.7.0-beta.18 Ports: 7373/TCP, 50101/TCP Host Ports: 0/TCP, 0/TCP Command: lorry --port 7373 --grpcport 50101 Readiness: exec [/bin/grpc_health_probe -addr=:50101] delay=0s timeout=1s period=2s #success=1 #failure=3 Environment: KB_RSM_USERNAME: <set to the key 'username' in secret 'polardbx-tjxuol-conn-credential'> Optional: false KB_RSM_PASSWORD: <set to the key 'password' in secret 'polardbx-tjxuol-conn-credential'> Optional: false KB_RSM_ACTION_SVC_LIST: [36501] KB_SERVICE_USER: <set to the key 'username' in secret 'polardbx-tjxuol-conn-credential'> Optional: false KB_SERVICE_PASSWORD: <set to the key 'password' in secret 'polardbx-tjxuol-conn-credential'> Optional: false KB_RSM_SERVICE_PORT: 3306 KB_SERVICE_PORT: 3306 KB_RSM_ROLE_UPDATE_MECHANISM: DirectAPIServerEventUpdate KB_RSM_ROLE_PROBE_TIMEOUT: 1 KB_POD_NAME: (v1:metadata.name) KB_NAMESPACE: (v1:metadata.namespace) KB_POD_UID: (v1:metadata.uid) KB_NODENAME: (v1:spec.nodeName) KB_SERVICE_CHARACTER_TYPE: custom Mounts: action-0: Image: arey/mysql-client:latest Port: Host Port: Command: /role-probe/agent -port 36501 -export-all-vars -form /role mysql -h127.0.0.1 -P3306 -uroot -N -B -e "select role from information_schema.alisql_cluster_local" xargs echo -n Environment: KB_RSM_USERNAME: <set to the key 'username' in secret 'polardbx-tjxuol-conn-credential'> Optional: false KB_RSM_PASSWORD: <set to the key 'password' in secret 'polardbx-tjxuol-conn-credential'> Optional: false Mounts: /role-probe from role-agent (rw) Volumes: xstore-tools: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: podinfo: Type: DownwardAPI (a volume populated by information about the pod) Items: metadata.labels -> labels metadata.annotations -> annotations metadata.annotations['runmode'] -> runmode metadata.name -> name metadata.namespace -> namespace scripts: Type: ConfigMap (a volume populated by a ConfigMap) Name: polardbx-tjxuol-dn-polardbx-scripts Optional: false data: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: data-log: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: role-agent: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: Volume Claims: Name: data StorageClass: kb-default-sc Labels: apps.kubeblocks.io/vct-name=data Annotations: Capacity: 20Gi Access Modes: [ReadWriteOnce] Events: Type Reason Age From Message

Normal SuccessfulCreate 41m statefulset-controller create Claim data-polardbx-tjxuol-dn-0 Pod polardbx-tjxuol-dn-0 in StatefulSet polardbx-tjxuol-dn success Normal SuccessfulCreate 41m statefulset-controller create Claim data-polardbx-tjxuol-dn-1 Pod polardbx-tjxuol-dn-1 in StatefulSet polardbx-tjxuol-dn success Normal SuccessfulCreate 41m statefulset-controller create Pod polardbx-tjxuol-dn-1 in StatefulSet polardbx-tjxuol-dn successful Normal SuccessfulCreate 41m statefulset-controller create Claim data-polardbx-tjxuol-dn-2 Pod polardbx-tjxuol-dn-2 in StatefulSet polardbx-tjxuol-dn success Normal SuccessfulCreate 41m statefulset-controller create Pod polardbx-tjxuol-dn-2 in StatefulSet polardbx-tjxuol-dn successful Normal SuccessfulCreate 28m (x2 over 41m) statefulset-controller create Pod polardbx-tjxuol-dn-0 in StatefulSet polardbx-tjxuol-dn successful Warning RecreatingFailedPod 28m (x8 over 28m) statefulset-controller StatefulSet default/polardbx-tjxuol-dn is recreating failed Pod polardbx-tjxuol-dn-0 Normal SuccessfulDelete 28m (x8 over 28m) statefulset-controller delete Pod polardbx-tjxuol-dn-0 in StatefulSet polardbx-tjxuol-dn successful ➜ ~

free6om commented 1 year ago

seems something went wrong in the DB container:

2023-11-03 04:12:17,738 - GalaxyEngine - INFO - () start command: /opt/galaxy_engine/bin/mysqld_safe --defaults-file=/data/mysql/conf/my.cnf --loose-pod-name=polardbx-tjxuol-gms-1
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
2023-11-03T04:12:23.302469Z mysqld_safe Logging to '/data/mysql/log/alert.log'.
2023-11-03T04:12:23.414770Z mysqld_safe Starting mysqld daemon with databases from /data/mysql/data
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
ERROR 2003 (HY000): Can't connect to MySQL server on '127.1' (111)
wait mysql ready
free6om commented 1 year ago

polardbx gms&dn component depend on immutable IP address currently, which means the pods can't be rescheduled yet. need member reconfiguration configured to support restart, will add it in 0.8 or later.