apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.08k stars 170 forks source link

[BUG]Hscale after hscaleoffinstance failed for mysql with smart-engine #7652

Closed ahjing99 closed 3 months ago

ahjing99 commented 3 months ago

➜ ~ kbcli version Kubernetes: v1.29.4-gke.1043002 KubeBlocks: 0.9.0-beta.39 kbcli: 0.9.0-beta.27

Hscale after hscaleoffinstance also failed on workflow with different stage and reason https://github.com/apecloud/kubeblocks/actions/runs/9676387823/job/26697239490

  1. Create cluster
    
    `kbcli cluster create  smarte-xihksa --termination-policy=WipeOut --cluster-definition=apecloud-mysql --enable-all-logs=false --cluster-version=ac-mysql-8.0.30-1 --set cpu=500m,memory=1Gi,replicas=3,storage=20Gi  `

Cluster smarte-xihksa created

kbcli cluster configure smarte-xihksa --auto-approve --force=true --set loose_smartengine=ON,binlog_format=ROW,default_storage_engine=smartengine --components mysql --config-spec mysql-consensusset-config --config-file my.cnf

Will updated configure file meta: ConfigSpec: mysql-consensusset-config ConfigFile: my.cnf ComponentName: mysql ClusterName: smarte-xihksa
OpsRequest smarte-xihksa-reconfiguring-6qjjt created successfully, you can view the progress: kbcli cluster describe-ops smarte-xihksa-reconfiguring-6qjjt

➜ ~ kbcli cluster describe smarte-xihksa Name: smarte-xihksa Created Time: Jun 27,2024 15:25 UTC+0800 NAMESPACE CLUSTER-DEFINITION VERSION STATUS TERMINATION-POLICY default apecloud-mysql ac-mysql-8.0.30-1 Running WipeOut

Endpoints: COMPONENT MODE INTERNAL EXTERNAL mysql ReadWrite smarte-xihksa-mysql.default.svc.cluster.local:3306

Topology: COMPONENT INSTANCE ROLE STATUS AZ NODE CREATED-TIME mysql smarte-xihksa-mysql-0 follower Running us-central1-c gke-yjtest-default-pool-2619f239-4tr1/10.128.0.33 Jun 27,2024 15:31 UTC+0800 mysql smarte-xihksa-mysql-1 follower Running us-central1-c gke-yjtest-default-pool-2619f239-mmwv/10.128.0.34 Jun 27,2024 15:32 UTC+0800 mysql smarte-xihksa-mysql-2 leader Running us-central1-c gke-yjtest-default-pool-2619f239-vflk/10.128.0.31 Jun 27,2024 15:30 UTC+0800

Resources Allocation: COMPONENT DEDICATED CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE-SIZE STORAGE-CLASS mysql false 500m / 500m 1Gi / 1Gi data:20Gi kb-default-sc

Images: COMPONENT TYPE IMAGE mysql mysql infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/apecloud-mysql-server:8.0.30-5.beta3.20231215.ge77d836.14

Data Protection: BACKUP-REPO AUTO-BACKUP BACKUP-SCHEDULE BACKUP-METHOD BACKUP-RETENTION RECOVERABLE-TIME

Show cluster events: kbcli cluster list-events -n default smarte-xihksa

2. hsacleoffinstance

apiVersion: apps.kubeblocks.io/v1alpha1 kind: OpsRequest metadata: generateName: smarte-xihksa-hscaleoffinstance- labels: app.kubernetes.io/instance: smarte-xihksa app.kubernetes.io/managed-by: kubeblocks namespace: default spec: type: HorizontalScaling clusterRef: smarte-xihksa force: true horizontalScaling:

➜ ~ kbcli cluster describe smarte-xihksa Name: smarte-xihksa Created Time: Jun 27,2024 15:25 UTC+0800 NAMESPACE CLUSTER-DEFINITION VERSION STATUS TERMINATION-POLICY default apecloud-mysql ac-mysql-8.0.30-1 Running WipeOut

Endpoints: COMPONENT MODE INTERNAL EXTERNAL mysql ReadWrite smarte-xihksa-mysql.default.svc.cluster.local:3306

Topology: COMPONENT INSTANCE ROLE STATUS AZ NODE CREATED-TIME mysql smarte-xihksa-mysql-1 follower Running us-central1-c gke-yjtest-default-pool-2619f239-mmwv/10.128.0.34 Jun 27,2024 15:32 UTC+0800 mysql smarte-xihksa-mysql-2 leader Running us-central1-c gke-yjtest-default-pool-2619f239-vflk/10.128.0.31 Jun 27,2024 15:30 UTC+0800

Resources Allocation: COMPONENT DEDICATED CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE-SIZE STORAGE-CLASS mysql false 500m / 500m 1Gi / 1Gi data:20Gi kb-default-sc

Images: COMPONENT TYPE IMAGE mysql mysql infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/apecloud-mysql-server:8.0.30-5.beta3.20231215.ge77d836.14

Data Protection: BACKUP-REPO AUTO-BACKUP BACKUP-SCHEDULE BACKUP-METHOD BACKUP-RETENTION RECOVERABLE-TIME

Show cluster events: kbcli cluster list-events -n default smarte-xihksa


3. Hscale

➜ ~ kbcli cluster hscale smarte-xihksa --auto-approve --force=true --components mysql --replicas 5 OpsRequest smarte-xihksa-horizontalscaling-vql8d created successfully, you can view the progress: kbcli cluster describe-ops smarte-xihksa-horizontalscaling-vql8d -n default

➜ ~ k describe pod restore-preparedata-4dda3270-smarte-xihksa-mysql-scaling-0mzzcs Name: restore-preparedata-4dda3270-smarte-xihksa-mysql-scaling-0mzzcs Namespace: default Priority: 0 Node: gke-yjtest-default-pool-2619f239-4tr1/10.128.0.33 Start Time: Thu, 27 Jun 2024 15:36:10 +0800 Labels: app.kubernetes.io/instance=smarte-xihksa app.kubernetes.io/managed-by=kubeblocks-dataprotection app.kubernetes.io/name=apecloud-mysql apps.kubeblocks.io/component-name=mysql apps.kubeblocks.io/vct-name=data batch.kubernetes.io/controller-uid=17810edd-d55d-457d-a6ca-3f5f9ea9f309 batch.kubernetes.io/job-name=restore-preparedata-4dda3270-smarte-xihksa-mysql-scaling-0 controller-uid=17810edd-d55d-457d-a6ca-3f5f9ea9f309 dataprotection.kubeblocks.io/restore=smarte-xihksa-mysql-50025482-preparedata-2 job-name=restore-preparedata-4dda3270-smarte-xihksa-mysql-scaling-0 kubeblocks.io/volume-type=data Annotations: Status: Pending IP: IPs: Controlled By: Job/restore-preparedata-4dda3270-smarte-xihksa-mysql-scaling-0 Init Containers: dp-copy-datasafed: Container ID: Image: docker.io/apecloud/datasafed:0.2.0 Image ID: Port: Host Port: Command: /bin/sh -c /scripts/install-datasafed.sh /bin/datasafed State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Limits: cpu: 0 memory: 0 Requests: cpu: 0 memory: 0 Environment: Mounts: /bin/datasafed from dp-datasafed-bin (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dqc4v (ro) Containers: restore: Container ID: Image: infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/apecloud-xtrabackup:8.0 Image ID: Port: Host Port: Command: sh -c

!/bin/bash

  set -e
  set -o pipefail
  export PATH="$PATH:$DP_DATASAFED_BIN_PATH"
  export DATASAFED_BACKEND_BASE_PATH="$DP_BACKUP_BASE_PATH"
  mkdir -p ${DATA_DIR}
  TMP_DIR=${DATA_MOUNT_DIR}/temp
  mkdir -p ${TMP_DIR} && cd ${TMP_DIR}

  old_signal="apecloud-mysql.old"
  log_bin=${LOG_BIN}
  if [ "$(datasafed list ${old_signal})" == "${old_signal}" ]; then
     log_bin="${DATA_DIR}/mysql-bin"
  fi

  datasafed pull "${DP_BACKUP_NAME}.xbstream" - | xbstream -x
  xtrabackup --decompress --remove-original --target-dir=${TMP_DIR}
  xtrabackup --prepare --target-dir=${TMP_DIR}
  xtrabackup --move-back --target-dir=${TMP_DIR} --datadir=${DATA_DIR}/ --log-bin=${log_bin}
  touch ${DATA_DIR}/${SIGNAL_FILE}
  rm -rf ${TMP_DIR}
  chmod -R 0777 ${DATA_DIR}

State:          Waiting
  Reason:       PodInitializing
Ready:          False
Restart Count:  0
Limits:
  cpu:     0
  memory:  0
Requests:
  cpu:     0
  memory:  0
Environment:
  DP_BACKUP_NAME:         smarte-xihksa-mysql-scaling
  DP_BACKUP_BASE_PATH:    /default/smarte-xihksa-50025482-37ad-4ed5-9d0b-16cf81712511/mysql/smarte-xihksa-mysql-scaling
  DP_BACKUP_STOP_TIME:    2024-06-27T07:36:01Z
  DATA_DIR:               /data/mysql/data
  LOG_BIN:                /data/mysql/binlog/mysql-bin
  DP_DB_PORT:             3306
  DATA_MOUNT_DIR:         /data/mysql
  SIGNAL_FILE:            .xtrabackup_restore
  DP_DATASAFED_BIN_PATH:  /bin/datasafed
Mounts:
  /bin/datasafed from dp-datasafed-bin (rw)
  /data/mysql from dp-claim-tpl-data-smarte-xihksa-mysql-2 (rw)
  /etc/datasafed from dp-datasafed-config (ro)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dqc4v (ro)

Conditions: Type Status PodReadyToStartContainers False Initialized False Ready False ContainersReady False PodScheduled True Volumes: dp-claim-tpl-data-smarte-xihksa-mysql-2: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: data-smarte-xihksa-mysql-2 ReadOnly: false dp-datasafed-config: Type: Secret (a volume populated by a Secret) SecretName: tool-config-backuprepo-kbcli-test-gxqmbf Optional: false dp-datasafed-bin: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: kube-api-access-dqc4v: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: kb-data=true:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Scheduled 7m19s default-scheduler Successfully assigned default/restore-preparedata-4dda3270-smarte-xihksa-mysql-scaling-0mzzcs to gke-yjtest-default-pool-2619f239-4tr1 Warning FailedAttachVolume 7m19s attachdetach-controller Multi-Attach error for volume "pvc-f7a57f12-722f-4506-bf7e-714726955a53" Volume is already used by pod(s) smarte-xihksa-mysql-2 Warning FailedMount 7m17s kubelet MountVolume.SetUp failed for volume "dp-datasafed-config" : failed to sync secret cache: timed out waiting for the condition

➜ ~ k get pod | grep smarte restore-preparedata-4dda3270-smarte-xihksa-mysql-scaling-0mzzcs 0/1 Init:0/1 0 7m42s smarte-xihksa-mysql-1 4/4 Running 0 11m smarte-xihksa-mysql-2 4/4 Running 0 13m ➜ ~ k get pvc | grep smarte data-smarte-xihksa-mysql-1 Bound pvc-1f7a5834-e3b5-44e0-b95b-041c18da7213 20Gi RWO kb-default-sc 18m data-smarte-xihksa-mysql-2 Bound pvc-f7a57f12-722f-4506-bf7e-714726955a53 20Gi RWO kb-default-sc 18m data-smarte-xihksa-mysql-3 Bound pvc-ff9bce5b-7115-40d2-9338-422ca036b75d 20Gi RWO kb-default-sc 7m48s data-smarte-xihksa-mysql-4 Bound pvc-c65ec2ea-f359-40ce-994a-4bf605a233bf 20Gi RWO kb-default-sc 7m48s



➜  ~ k logs kubeblocks-5669897bf5-nnz75 -n kb-system >kb.txt
Defaulted container "manager" out of: manager, tools (init), datascript (init)

[kb.txt](https://github.com/user-attachments/files/16009988/kb.txt)
free6om commented 3 months ago

fixed by #7645