apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
1.8k stars 157 forks source link

[BUG] oceanbase ent distributed cluster stop and start pod CrashLoopBackOff #7714

Closed JashBook closed 1 week ago

JashBook commented 2 weeks ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. create cluster
    apiVersion: apps.kubeblocks.io/v1alpha1
    kind: Cluster
    metadata:
    name: obent-xdoweh
    namespace: default
    annotations:
    "kubeblocks.io/extra-env": '{"TENANT_NAME":"tenant3","ZONE_COUNT":"3","TENANT_SYSTEM_TIME_ZONE":"+08:00","TENANT_COLLATION":"utf8mb4_general_ci","TENANT_CHARSET":"utf8mb4","OB_CLUSTERS_COUNT":"1","TENANT_CPU":"2","TENANT_MEMORY":"2G","TENANT_DISK":"5G"}'
    spec:
    terminationPolicy: Delete
    componentSpecs:
    - name: ob-bundle
      componentDefRef: oceanbase
      componentDef: oceanbase
      replicas: 3
      resources:
        requests:
          cpu: 3000m
          memory: 8Gi
        limits:
          cpu: 3000m
          memory: 8Gi
      volumeClaimTemplates:
        - name: data-file
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: "50Gi"
        - name: data-log
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: "50Gi"
        - name: log
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: "10Gi"
        - name: workdir
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: "100Mi"
  2. stop and start
    
    kbcli cluster stop obent-xdoweh --auto-approve --force=true

kbcli cluster start obent-xdoweh --force=true

3. See error

kubectl get pod -l app.kubernetes.io/instance=obent-xdoweh
NAME READY STATUS RESTARTS AGE obent-xdoweh-ob-bundle-0 2/3 Error 2 (26s ago) 42s

kubectl get ops -l app.kubernetes.io/instance=obent-xdoweh NAME TYPE CLUSTER STATUS PROGRESS AGE obent-xdoweh-start-tdwrw Start obent-xdoweh Running 0/3 58s obent-xdoweh-stop-6xx4m Stop obent-xdoweh Succeed 3/3 4m49s

losg pod 

kubectl logs obent-xdoweh-ob-bundle-0 observer-container ORDINAL_INDEX: 0 COMPONENT_INDEX: bundle ZONE_NAME: zone0 COMP_MYSQL_PORT: COMP_RPC_PORT: OB_USE_CLUSTER_IP: enabled Recovering: True sql_port: 2881 rpc_port: 2882 IP changed, failed to rejoin the cluster



**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
 - OS: [e.g. iOS]
 - Browser [e.g. chrome, safari]
 - Version [e.g. 22]

**Additional context**
Add any other context about the problem here.
shanshanying commented 2 weeks ago

per-pod-svc is recycled on STOP, in KBV9 (maybe after beta33). will update the addon.

shanshanying commented 2 weeks ago

After discussion, @leon-inf will refine the STOP procedure in KB.

shanshanying commented 2 weeks ago

After discussion, @leon-inf will refine the STOP procedure in KB.

Will fix in 0.9.1

ahjing99 commented 1 week ago

closed as fixed at 0.9.1-beta.2