apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.22k stars 184 forks source link

[BUG]redis volume expand hang #7604

Closed ahjing99 closed 5 months ago

ahjing99 commented 5 months ago

➜ ~ kbcli version Kubernetes: v1.27.11-gke.1062004 KubeBlocks: 0.9.0-beta.35 kbcli: 0.9.0-beta.27

This does not fail every time

  1. create cluster

    ➜  ~ k get cluster redis-qjdsth -o yaml
    apiVersion: apps.kubeblocks.io/v1alpha1
    kind: Cluster
    metadata:
    annotations:
    kubeblocks.io/ops-request: '[{"name":"redis-qjdsth-volumeexpansion-vpjmw","type":"VolumeExpansion","queueBySelf":true}]'
    kubeblocks.io/reconcile: "2024-06-21T06:39:19.609022834Z"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps.kubeblocks.io/v1alpha1","kind":"Cluster","metadata":{"annotations":{},"name":"redis-qjdsth","namespace":"default"},"spec":{"componentSpecs":[{"componentDef":"redis-7","name":"redis","replicas":2,"resources":{"limits":{"cpu":"100m","memory":"0.5Gi"},"requests":{"cpu":"100m","memory":"0.5Gi"}},"switchPolicy":{"type":"Noop"},"volumeClaimTemplates":[{"name":"data","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}},"storageClassName":null}}]},{"componentDef":"redis-sentinel-7","name":"redis-sentinel","replicas":3,"resources":{"limits":{"cpu":"100m","memory":"0.5Gi"},"requests":{"cpu":"100m","memory":"0.5Gi"}},"volumeClaimTemplates":[{"name":"data","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}},"storageClassName":null}}]},{"componentDef":"redis-twemproxy-0.5","name":"redis-twemproxy","replicas":3,"resources":{"limits":{"cpu":"100m","memory":"0.5Gi"},"requests":{"cpu":"100m","memory":"0.5Gi"}},"volumeClaimTemplates":[{"name":"data","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}},"storageClassName":null}}]}],"terminationPolicy":"Halt"}}
    creationTimestamp: "2024-06-21T05:52:02Z"
    finalizers:
    - cluster.kubeblocks.io/finalizer
    generation: 11
    labels:
    app.kubernetes.io/instance: redis-qjdsth
    name: redis-qjdsth
    namespace: default
    resourceVersion: "97873"
    uid: af8a1a8e-2fb2-4d3b-b370-b6c060f36434
    spec:
    componentSpecs:
    - componentDef: redis-7
    name: redis
    offlineInstances:
    - redis-qjdsth-redis-0
    replicas: 3
    resources:
      limits:
        cpu: 200m
        memory: 644245094400m
      requests:
        cpu: 200m
        memory: 644245094400m
    serviceVersion: 7.2.4
    switchPolicy:
      type: Noop
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 6Gi
    - componentDef: redis-sentinel-7
    name: redis-sentinel
    replicas: 3
    resources:
      limits:
        cpu: 100m
        memory: 512Mi
      requests:
        cpu: 100m
        memory: 512Mi
    serviceVersion: 7.2.4
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
    - componentDef: redis-twemproxy-0.5
    name: redis-twemproxy
    replicas: 3
    resources:
      limits:
        cpu: 100m
        memory: 512Mi
      requests:
        cpu: 100m
        memory: 512Mi
    serviceVersion: 0.5.0
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
    resources:
    cpu: "0"
    memory: "0"
    services:
    - componentSelector: redis
    name: redis-internet
    roleSelector: primary
    serviceName: redis-internet
    spec:
      ports:
      - name: redis
        nodePort: 30337
        port: 6379
        protocol: TCP
        targetPort: redis
      type: LoadBalancer
    storage:
    size: "0"
    terminationPolicy: Halt
    status:
    components:
    redis:
      phase: Running
      podsReady: true
      podsReadyTime: "2024-06-21T06:12:40Z"
    redis-sentinel:
      phase: Running
      podsReady: true
      podsReadyTime: "2024-06-21T06:12:07Z"
    redis-twemproxy:
      phase: Running
      podsReady: true
      podsReadyTime: "2024-06-21T06:12:06Z"
    conditions:
    - lastTransitionTime: "2024-06-21T05:52:02Z"
    message: 'The operator has started the provisioning of Cluster: redis-qjdsth'
    observedGeneration: 11
    reason: PreCheckSucceed
    status: "True"
    type: ProvisioningStarted
    - lastTransitionTime: "2024-06-21T05:55:29Z"
    message: Successfully applied for resources
    observedGeneration: 11
    reason: ApplyResourcesSucceed
    status: "True"
    type: ApplyResources
    - lastTransitionTime: "2024-06-21T06:12:40Z"
    message: all pods of components are ready, waiting for the probe detection successful
    reason: AllReplicasReady
    status: "True"
    type: ReplicasReady
    - lastTransitionTime: "2024-06-21T06:12:40Z"
    message: 'Cluster: redis-qjdsth is ready, current phase is Running'
    reason: ClusterReady
    status: "True"
    type: Ready
    observedGeneration: 11
    phase: Running
  2. volume expand, 2 redis pod expand success while the 3rd hang

    
    `kbcli cluster volume-expand redis-qjdsth --auto-approve --force=true                 --components redis                 --volume-claim-templates data                 --storage 6Gi --namespace default `

OpsRequest redis-qjdsth-volumeexpansion-vpjmw created successfully, you can view the progress: kbcli cluster describe-ops redis-qjdsth-volumeexpansion-vpjmw -n default

➜ ~ kbcli cluster describe-ops redis-qjdsth-volumeexpansion-vpjmw -n default Spec: Name: redis-qjdsth-volumeexpansion-vpjmw NameSpace: default Cluster: redis-qjdsth Type: VolumeExpansion

Command: kbcli cluster volume-expand redis-qjdsth --components=redis --volume-claim-template-names=data --storage=6Gi --namespace=default

Last Configuration: COMPONENT VOLUME-CLAIM-TEMPLATE STORAGE

Status: Start Time: Jun 21,2024 14:12 UTC+0800 Completion Time: Jun 21,2024 14:42 UTC+0800 Duration: 30m Status: Failed Progress: 2/3 OBJECT-KEY STATUS DURATION MESSAGE PVC/data-redis-qjdsth-redis-2(data) Succeed Successfully expand volume: PVC/data-redis-qjdsth-redis-2 in component: redis PVC/data-redis-qjdsth-redis-1(data) Succeed Successfully expand volume: PVC/data-redis-qjdsth-redis-1 in component: redis

Conditions: LAST-TRANSITION-TIME TYPE REASON STATUS MESSAGE Jun 21,2024 14:12 UTC+0800 WaitForProgressing WaitForProgressing True wait for the controller to process the OpsRequest: redis-qjdsth-volumeexpansion-vpjmw in Cluster: redis-qjdsth Jun 21,2024 14:12 UTC+0800 Validated ValidateOpsRequestPassed True OpsRequest: redis-qjdsth-volumeexpansion-vpjmw is validated Jun 21,2024 14:12 UTC+0800 VolumeExpanding VolumeExpansionStarted True Start to expand the volumes in Cluster: redis-qjdsth Jun 21,2024 14:42 UTC+0800 Failed OpsRequestFailed False Timed out waiting for volume expansion to complete, the timeout value is 30 minutes

Warning Events: TIME TYPE REASON OBJECT MESSAGE Jun 21,2024 14:42 UTC+0800 Warning OpsRequestFailed OpsRequest/redis-qjdsth-volumeexpansion-vpjmw Timed out waiting for volume expansion to complete, the timeout value is 30 minutes

➜ ~ k get pvc | grep redis-qjdsth data-redis-qjdsth-redis-1 Bound pvc-6b06ecfb-7414-4a71-a681-ce18b4cbbfc6 6Gi RWO kb-default-sc 52m data-redis-qjdsth-redis-2 Bound pvc-411711a2-6005-4507-97ce-99066ebf81a8 6Gi RWO kb-default-sc 48m data-redis-qjdsth-redis-3 Bound pvc-587a9eb5-1bfd-4e1f-8402-0bba9f8f1367 1Gi RWO kb-default-sc 48m data-redis-qjdsth-redis-sentinel-0 Bound pvc-90e76298-7cbf-4fd9-9c96-fd11b0a5e64e 1Gi RWO kb-default-sc 52m data-redis-qjdsth-redis-sentinel-1 Bound pvc-5be45130-a760-4277-b718-24415019770c 1Gi RWO kb-default-sc 52m data-redis-qjdsth-redis-sentinel-2 Bound pvc-c52776e5-a2cf-4d3f-9068-24264d87d028 1Gi RWO kb-default-sc 52m data-redis-qjdsth-redis-twemproxy-0 Bound pvc-6ce24656-d359-4b37-bdde-2bcd93b48eb0 1Gi RWO kb-default-sc 52m data-redis-qjdsth-redis-twemproxy-1 Bound pvc-a06ac4f3-dbf0-4278-ac02-70cca88c695a 1Gi RWO kb-default-sc 52m data-redis-qjdsth-redis-twemproxy-2 Bound pvc-d531ff44-86bc-4b92-8c31-c2cda4eebcea 1Gi RWO kb-default-sc 52m


➜  ~ k logs kubeblocks-69895c69fc-dfs2f -n kb-system >kb.txt
Defaulted container "manager" out of: manager, tools (init), datascript (init)
[kb.txt](https://github.com/user-attachments/files/15923876/kb.txt)
Y-Rookie commented 5 months ago

This should be a common issue unrelated to a specific Redis instance. The issue should have been fixed in another PR. You can retest it.

ahjing99 commented 5 months ago

cannot reproduce ,closing