apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.08k stars 170 forks source link

[BUG]kafka volume-expand hang after RebuildInstance #7651

Closed ahjing99 closed 3 months ago

ahjing99 commented 3 months ago

➜ ~ kbcli version Kubernetes: v1.29.4-gke.1043002 KubeBlocks: 0.9.0-beta.39 kbcli: 0.9.0-beta.27

  1. Create cluster

    
    
      `kbcli cluster create  kafka kafka-fwgwiy                 --mode='combined'                 --cpu=0.5                 --memory=0.5                 --storage=1                 --availability-policy=none --termination-policy=DoNotTerminate --version=kafka-3.3.2  --storage-enable=true                 --meta-storage=1 --replicas=3  --namespace default `

Cluster kafka-fwgwiy created

  `kbcli cluster list-instances kafka-fwgwiy --namespace default `

NAME NAMESPACE CLUSTER COMPONENT STATUS ROLE ACCESSMODE AZ CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE NODE CREATED-TIME kafka-fwgwiy-broker-0 default kafka-fwgwiy broker Running us-central1-c 500m / 500m 512Mi / 512Mi data:1Gi gke-yjtest-default-pool-2619f239-mmwv/10.128.0.34 Jun 27,2024 14:57 UTC+0800 metadata:1Gi kafka-fwgwiy-broker-1 default kafka-fwgwiy broker Running us-central1-c 500m / 500m 512Mi / 512Mi data:1Gi gke-yjtest-default-pool-2619f239-4tr1/10.128.0.33 Jun 27,2024 14:57 UTC+0800 metadata:1Gi kafka-fwgwiy-broker-2 default kafka-fwgwiy broker Running us-central1-c 500m / 500m 512Mi / 512Mi data:1Gi gke-yjtest-default-pool-2619f239-vflk/10.128.0.31 Jun 27,2024 14:57 UTC+0800 metadata:1Gi kafka-fwgwiy-metrics-exp-0 default kafka-fwgwiy metrics-exp Running us-central1-c 500m / 500m 512Mi / 512Mi gke-yjtest-default-pool-2619f239-4tr1/10.128.0.33 Jun 27,2024 14:57 UTC+0800

2. Rebuild instance

apiVersion: apps.kubeblocks.io/v1alpha1 kind: OpsRequest metadata: generateName: kafka-fwgwiy-rebuildinstance- namespace: default spec: type: RebuildInstance clusterRef: kafka-fwgwiy force: true rebuildFrom:

check cluster status before ops check cluster status done cluster_status:Running

  `kubectl create -f test_ops_cluster_kafka-fwgwiy.yaml`

kbcli cluster list-instances kafka-fwgwiy --namespace default

NAME NAMESPACE CLUSTER COMPONENT STATUS ROLE ACCESSMODE AZ CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE NODE CREATED-TIME kafka-fwgwiy-broker-0 default kafka-fwgwiy broker Running us-central1-c 500m / 500m 512Mi / 512Mi data:1Gi gke-yjtest-default-pool-2619f239-mmwv/10.128.0.34 Jun 27,2024 14:59 UTC+0800 metadata:1Gi kafka-fwgwiy-broker-1 default kafka-fwgwiy broker Running us-central1-c 500m / 500m 512Mi / 512Mi data:1Gi gke-yjtest-default-pool-2619f239-4tr1/10.128.0.33 Jun 27,2024 14:57 UTC+0800 metadata:1Gi kafka-fwgwiy-broker-2 default kafka-fwgwiy broker Running us-central1-c 500m / 500m 512Mi / 512Mi data:1Gi gke-yjtest-default-pool-2619f239-vflk/10.128.0.31 Jun 27,2024 14:57 UTC+0800 metadata:1Gi kafka-fwgwiy-metrics-exp-0 default kafka-fwgwiy metrics-exp Running us-central1-c 500m / 500m 512Mi / 512Mi gke-yjtest-default-pool-2619f239-4tr1/10.128.0.33 Jun 27,2024 14:57 UTC+0800


3. volume-expand hang
  `kbcli cluster volume-expand kafka-fwgwiy --auto-approve --force=true                 --components broker                 --volume-claim-templates data                 --storage 3Gi --namespace default `

OpsRequest kafka-fwgwiy-volumeexpansion-g2mxf created successfully, you can view the progress: kbcli cluster describe-ops kafka-fwgwiy-volumeexpansion-g2mxf -n default

➜ ~ kbcli cluster describe-ops kafka-fwgwiy-volumeexpansion-g2mxf -n default Spec: Name: kafka-fwgwiy-volumeexpansion-g2mxf NameSpace: default Cluster: kafka-fwgwiy Type: VolumeExpansion

Command: kbcli cluster volume-expand kafka-fwgwiy --components=broker --volume-claim-template-names=data --storage=3Gi --namespace=default

Last Configuration: COMPONENT VOLUME-CLAIM-TEMPLATE STORAGE

Status: Start Time: Jun 27,2024 15:04 UTC+0800 Duration: 7m29s Status: Running Progress: 2/3 OBJECT-KEY STATUS DURATION MESSAGE PVC/data-kafka-fwgwiy-broker-1(data) Succeed Successfully expand volume: PVC/data-kafka-fwgwiy-broker-1 in component: broker PVC/data-kafka-fwgwiy-broker-2(data) Succeed Successfully expand volume: PVC/data-kafka-fwgwiy-broker-2 in component: broker

Conditions: LAST-TRANSITION-TIME TYPE REASON STATUS MESSAGE Jun 27,2024 15:04 UTC+0800 WaitForProgressing WaitForProgressing True wait for the controller to process the OpsRequest: kafka-fwgwiy-volumeexpansion-g2mxf in Cluster: kafka-fwgwiy Jun 27,2024 15:04 UTC+0800 Validated ValidateOpsRequestPassed True OpsRequest: kafka-fwgwiy-volumeexpansion-g2mxf is validated Jun 27,2024 15:04 UTC+0800 VolumeExpanding VolumeExpansionStarted True Start to expand the volumes in Cluster: kafka-fwgwiy

Warning Events:

2024-06-27T06:25:08.051Z INFO ConfigurationReconcile failed to run configuration reconcile task. {"controller": "configuration", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Configuration", "Configuration": {"name":"kafka-fwgwiy-broker","namespace":"default"}, "namespace": "default", "name": "kafka-fwgwiy-broker", "reconcileID": "775fa0bb-a9da-47c3-b7c6-c875c894c116", "configuration": {"name":"kafka-fwgwiy-broker","namespace":"default"}} 2024-06-27T06:25:08.051Z ERROR Reconciler error {"controller": "configuration", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Configuration", "Configuration": {"name":"kafka-fwgwiy-broker","namespace":"default"}, "namespace": "default", "name": "kafka-fwgwiy-broker", "reconcileID": "775fa0bb-a9da-47c3-b7c6-c875c894c116", "error": "has no Credential object admin found when resolving vars"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227



➜  ~ k logs kubeblocks-5669897bf5-nnz75 -n kb-system >kb.txt
Defaulted container "manager" out of: manager, tools (init), datascript (init)
[kb.txt](https://github.com/user-attachments/files/16009564/kb.txt)