apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.18k stars 181 forks source link

[BUG] mysql pod crash during mysql upgrade cmpv #8510

Open JashBook opened 14 hours ago

JashBook commented 14 hours ago

Describe the bug

kbcli version
Kubernetes: v1.30.6
KubeBlocks: 0.9.2-beta.22
kbcli: 0.9.1

To Reproduce Steps to reproduce the behavior:

  1. create cluster
    apiVersion: apps.kubeblocks.io/v1alpha1
    kind: Cluster
    metadata:
    name: asmysql-lslyqu
    namespace: default
    spec:
    clusterDefinitionRef: mysql
    topology: replication
    terminationPolicy: WipeOut
    componentSpecs:
    - name: mysql
      serviceVersion: 8.0.30
      replicas: 2
      resources:
        requests:
          cpu: 100m
          memory: 0.5Gi
        limits:
          cpu: 100m
          memory: 0.5Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 10Gi
  2. upgrade cmpv

    apiVersion: apps.kubeblocks.io/v1alpha1
    kind: OpsRequest
    metadata:
    generateName: asmysql-lslyqu-upgrade-cmpv-
    namespace: default
    spec:
    clusterName: asmysql-lslyqu
    upgrade:
    components:
    
    - componentName: mysql
      serviceVersion: 8.0.36
  3. See error
    
    kubectl get ops
    NAME                                TYPE      CLUSTER          STATUS   PROGRESS   AGE
    asmysql-lslyqu-upgrade-cmpv-9plb6   Upgrade   asmysql-lslyqu   Failed   2/2        11m

kubectl get pod NAME READY STATUS RESTARTS AGE asmysql-lslyqu-mysql-0 3/3 Running 10 (11m ago) 34m asmysql-lslyqu-mysql-1 3/3 Running 19 (11m ago) 34m

kubectl get cluster NAME CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS AGE asmysql-lslyqu mysql WipeOut Running 35m

describe cluster

kubectl describe cluster asmysql-lslyqu Name: asmysql-lslyqu Namespace: default Labels: clusterdefinition.kubeblocks.io/name=mysql clusterversion.kubeblocks.io/name= Annotations: kubeblocks.io/reconcile: 2024-11-22T07:39:37.544227051Z API Version: apps.kubeblocks.io/v1alpha1 Kind: Cluster Metadata: Creation Timestamp: 2024-11-22T07:19:15Z Finalizers: cluster.kubeblocks.io/finalizer Generation: 13 Resource Version: 147026 UID: 2be297fb-0381-4ee3-a24a-01d41ee5b6d4 Spec: Cluster Definition Ref: mysql Component Specs: Component Def: mysql-8.0 Name: mysql Replicas: 2 Resources: Limits: Cpu: 100m Memory: 512Mi Requests: Cpu: 100m Memory: 512Mi Service Version: 8.0.36 Volume Claim Templates: Name: data Spec: Access Modes: ReadWriteOnce Resources: Requests: Storage: 10Gi Resources: Cpu: 0 Memory: 0 Storage: Size: 0 Termination Policy: WipeOut Topology: replication Status: Cluster Def Generation: 3 Components: Mysql: Message: InstanceSet/asmysql-lslyqu-mysql: ["asmysql-lslyqu-mysql-1"] Phase: Running Pods Ready: true Pods Ready Time: 2024-11-22T07:44:00Z Conditions: Last Transition Time: 2024-11-22T07:19:15Z Message: The operator has started the provisioning of Cluster: asmysql-lslyqu Observed Generation: 13 Reason: PreCheckSucceed Status: True Type: ProvisioningStarted Last Transition Time: 2024-11-22T07:19:15Z Message: Successfully applied for resources Observed Generation: 13 Reason: ApplyResourcesSucceed Status: True Type: ApplyResources Last Transition Time: 2024-11-22T07:44:00Z Message: all pods of components are ready, waiting for the probe detection successful Reason: AllReplicasReady Status: True Type: ReplicasReady Last Transition Time: 2024-11-22T07:44:00Z Message: Cluster: asmysql-lslyqu is ready, current phase is Running Reason: ClusterReady Status: True Type: Ready Observed Generation: 13 Phase: Running Events: Type Reason Age From Message


Normal ComponentPhaseTransition 35m cluster-controller component is Creating Normal ApplyResourcesSucceed 32m (x4 over 35m) cluster-controller Successfully applied for resources Normal ClusterReady 32m (x2 over 33m) cluster-controller Cluster: asmysql-lslyqu is ready, current phase is Running Normal Running 32m (x2 over 33m) cluster-controller Cluster: asmysql-lslyqu is ready, current phase is Running Warning ComponentsNotReady 32m (x2 over 32m) cluster-controller pods are unavailable in Components: [mysql], refer to related component message in Cluster.status.components Normal ComponentPhaseTransition 32m (x2 over 32m) cluster-controller component is Updating Warning ReplicasNotReady 32m (x2 over 32m) cluster-controller pods are not ready in Components: [mysql], refer to related component message in Cluster.status.components Normal AllReplicasReady 30m (x3 over 33m) cluster-controller all pods of components are ready, waiting for the probe detection successful Normal PreCheckSucceed 24m (x7 over 35m) cluster-controller The operator has started the provisioning of Cluster: asmysql-lslyqu Normal ComponentPhaseTransition 19m (x12 over 33m) cluster-controller component is Running Warning Unhealthy 14m (x2 over 19m) event-controller Pod asmysql-lslyqu-mysql-0: Readiness probe failed: HTTP probe failed with statuscode: 500 Warning BackOff 14m (x2 over 19m) event-controller Pod asmysql-lslyqu-mysql-1: Back-off restarting failed container mysql in pod asmysql-lslyqu-mysql-1_default(f4dc6ac2-7b34-4448-8ded-dc7a86b6bfcf) Normal ComponentPhaseTransition 14m (x6 over 28m) cluster-controller component is Failed

logs pod

kubectl logs asmysql-lslyqu-mysql-1 -c mysql --previous 2024-11-22T07:42:33Z INFO Initialize DB manager 2024-11-22T07:42:33Z DEBUG KB_WORKLOAD_TYPE ENV not set 2024-11-22T07:42:33Z INFO HTTPServer Starting HTTP Server 2024-11-22T07:42:33Z INFO HTTPServer API route path {"method": "GET", "path": ["/v1.0/datasync"]} 2024-11-22T07:42:33Z INFO HTTPServer API route path {"method": "POST", "path": ["/v1.0/rebuild", "/v1.0/start", "/v1.0/stop", "/v1.0/switchover"]} 2024-11-22T07:42:33Z INFO HTTPServer http server {"listen address": "0.0.0.0", "port": 3601} 2024-11-22T07:42:33Z INFO HA HA starting 2024-11-22T07:42:33Z INFO pinger Waiting for dns resolution to be ready 2024-11-22T07:42:33Z INFO pinger dns resolution is ready {"dns": "asmysql-lslyqu-mysql-1.asmysql-lslyqu-mysql-headless.default.svc"} 2024-11-22T07:42:33Z INFO DCS-K8S pod selector: app.kubernetes.io/instance=asmysql-lslyqu,app.kubernetes.io/managed-by=kubeblocks,apps.kubeblocks.io/component-name=mysql 2024-11-22T07:42:34Z INFO DCS-K8S podlist: 2 2024-11-22T07:42:34Z DEBUG HA cluster info {"cluster": {"ClusterCompName":"asmysql-lslyqu-mysql","Namespace":"default","Replicas":2,"HaConfig":{"DeleteMembers":{}},"Leader":{"DBState":{"OpTimestamp":1732261353,"Extra":{"gtid_executed":"29de7113-a8a2-11ef-90f7-8e9a3d91bb4e:1-1785","gtid_purged":"","hostname":"asmysql-lslyqu-mysql-0","read_only":"0","server_uuid":"29de7113-a8a2-11ef-90f7-8e9a3d91bb4e","super_read_only":"0"}},"Index":"146389","Name":"asmysql-lslyqu-mysql-0","AcquireTime":1732261303,"RenewTime":1732261353,"TTL":15,"Resource":{"metadata":{"name":"asmysql-lslyqu-mysql-leader","namespace":"default","uid":"c664289d-b282-44b5-b7f0-e5994ff15418","resourceVersion":"146389","creationTimestamp":"2024-11-22T07:20:29Z","labels":{"app.kubernetes.io/instance":"asmysql-lslyqu","app.kubernetes.io/managed-by":"kubeblocks","apps.kubeblocks.io/component-name":"mysql"},"annotations":{"acquire-time":"1732261303","dbstate":"{\"OpTimestamp\":1732261353,\"Extra\":{\"gtid_executed\":\"29de7113-a8a2-11ef-90f7-8e9a3d91bb4e:1-1785\",\"gtid_purged\":\"\",\"hostname\":\"asmysql-lslyqu-mysql-0\",\"read_only\":\"0\",\"server_uuid\":\"29de7113-a8a2-11ef-90f7-8e9a3d91bb4e\",\"super_read_only\":\"0\"}}","leader":"asmysql-lslyqu-mysql-0","renew-time":"1732261353","ttl":"15"},"ownerReferences":[{"apiVersion":"apps.kubeblocks.io/v1alpha1","kind":"Cluster","name":"asmysql-lslyqu","uid":"2be297fb-0381-4ee3-a24a-01d41ee5b6d4"}],"managedFields":[{"manager":"syncer","operation":"Update","apiVersion":"v1","time":"2024-11-22T07:42:33Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:acquire-time":{},"f:dbstate":{},"f:leader":{},"f:renew-time":{},"f:ttl":{}},"f:labels":{".":{},"f:app.kubernetes.io/instance":{},"f:app.kubernetes.io/managed-by":{},"f:apps.kubeblocks.io/component-name":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"2be297fb-0381-4ee3-a24a-01d41ee5b6d4\"}":{}}}}}]}}},"Members":[{"Index":"","Name":"asmysql-lslyqu-mysql-0","Role":"primary","PodIP":"10.244.1.64","DBPort":"3306","SyncerPort":"3601","UID":"e4a7c55a-393b-4cbb-bdd0-c2971216a087","ComponentName":"mysql","UseIP":false},{"Index":"","Name":"asmysql-lslyqu-mysql-1","Role":"secondary","PodIP":"10.244.2.180","DBPort":"3306","SyncerPort":"3601","UID":"f4dc6ac2-7b34-4448-8ded-dc7a86b6bfcf","ComponentName":"mysql","UseIP":false}],"Switchover":null,"Extra":null,"Resource":{"metadata":{"name":"asmysql-lslyqu","namespace":"default","uid":"2be297fb-0381-4ee3-a24a-01d41ee5b6d4","resourceVersion":"146384","generation":13,"creationTimestamp":"2024-11-22T07:19:15Z","labels":{"clusterdefinition.kubeblocks.io/name":"mysql","clusterversion.kubeblocks.io/name":""},"annotations":{"kubeblocks.io/ops-request":"[{\"name\":\"asmysql-lslyqu-upgrade-cmpv-9plb6\",\"type\":\"Upgrade\"}]","kubeblocks.io/reconcile":"2024-11-22T07:39:37.544227051Z","kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"apps.kubeblocks.io/v1alpha1\",\"kind\":\"Cluster\",\"metadata\":{\"annotations\":{},\"name\":\"asmysql-lslyqu\",\"namespace\":\"default\"},\"spec\":{\"clusterDefinitionRef\":\"mysql\",\"componentSpecs\":[{\"name\":\"mysql\",\"replicas\":2,\"resources\":{\"limits\":{\"cpu\":\"100m\",\"memory\":\"0.5Gi\"},\"requests\":{\"cpu\":\"100m\",\"memory\":\"0.5Gi\"}},\"serviceVersion\":\"8.0.30\",\"volumeClaimTemplates\":[{\"name\":\"data\",\"spec\":{\"accessModes\":[\"ReadWriteOnce\"],\"resources\":{\"requests\":{\"storage\":\"10Gi\"}},\"storageClassName\":null}}]}],\"terminationPolicy\":\"WipeOut\",\"topology\":\"replication\"}}\n"},"finalizers":["cluster.kubeblocks.io/finalizer"],"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"apps.kubeblocks.io/v1alpha1","time":"2024-11-22T07:19:15Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:clusterDefinitionRef":{},"f:terminationPolicy":{},"f:topology":{}}}},{"manager":"kubeblocks","operation":"Update","apiVersion":"apps.kubeblocks.io/v1alpha1","time":"2024-11-22T07:42:32Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{"f:kubeblocks.io/ops-request":{},"f:kubeblocks.io/reconcile":{}},"f:finalizers":{".":{},"v:\"cluster.kubeblocks.io/finalizer\"":{}},"f:labels":{".":{},"f:clusterdefinition.kubeblocks.io/name":{},"f:clusterversion.kubeblocks.io/name":{}}},"f:spec":{"f:componentSpecs":{},"f:resources":{".":{},"f:cpu":{},"f:memory":{}},"f:storage":{".":{},"f:size":{}}}}},{"manager":"kubeblocks","operation":"Update","apiVersion":"apps.kubeblocks.io/v1alpha1","time":"2024-11-22T07:42:32Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:clusterDefGeneration":{},"f:components":{".":{},"f:mysql":{".":{},"f:message":{".":{},"f:InstanceSet/asmysql-lslyqu-mysql":{}},"f:phase":{},"f:podsReady":{},"f:podsReadyTime":{}}},"f:conditions":{},"f:observedGeneration":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"clusterDefinitionRef":"mysql","terminationPolicy":"WipeOut","componentSpecs":[{"name":"mysql","componentDef":"mysql-8.0","replicas":2,"resources":{"limits":{"cpu":"100m","memory":"512Mi"},"requests":{"cpu":"100m","memory":"512Mi"}},"volumeClaimTemplates":[{"name":"data","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"10Gi"}}}}]}],"resources":{"cpu":"0","memory":"0"},"storage":{"size":"0"},"monitor":{}},"status":{"observedGeneration":13,"phase":"Updating","components":{"mysql":{"phase":"Updating","message":{"InstanceSet/asmysql-lslyqu-mysql":"[\"asmysql-lslyqu-mysql-1\"]"},"podsReady":false,"podsReadyTime":"2024-11-22T07:41:50Z"}},"clusterDefGeneration":3,"conditions":[{"type":"ProvisioningStarted","status":"True","observedGeneration":13,"lastTransitionTime":"2024-11-22T07:19:15Z","reason":"PreCheckSucceed","message":"The operator has started the provisioning of Cluster: asmysql-lslyqu"},{"type":"ApplyResources","status":"True","observedGeneration":13,"lastTransitionTime":"2024-11-22T07:19:15Z","reason":"ApplyResourcesSucceed","message":"Successfully applied for resources"},{"type":"ReplicasReady","status":"False","lastTransitionTime":"2024-11-22T07:42:32Z","reason":"ReplicasNotReady","message":"pods are not ready in Components: [mysql], refer to related component message in Cluster.status.components"},{"type":"Ready","status":"False","lastTransitionTime":"2024-11-22T07:42:32Z","reason":"ComponentsNotReady","message":"pods are unavailable in Components: [mysql], refer to related component message in Cluster.status.components"}]}}}} 2024-11-22T07:42:34Z INFO Hypervisor Starting Hypervisor 2024-11-22T07:42:34Z INFO Hypervisor Start DB Service {"command": "/usr/bin/bash -c cp /var/lib/mysql/plugin/audit_log.so /usr/lib64/mysql/plugin/\nchown -R mysql:root /var/lib/mysql\nskip_slave_start=\"OFF\"\nexport LD_PRELOAD=/tools/lib/libjemalloc.so.2 \nif [ -f /var/lib/mysql/data/.restore_new_cluster ]; then\n skip_slave_start=\"ON\"\nfi\n/scripts/mysql-entrypoint.sh\n"} 2024-11-22T07:42:34Z INFO Hypervisor Starting watcher on dbService 2024-11-22T07:42:34Z INFO MySQL DB is not ready {"error": "dial tcp 127.0.0.1:3306: connect: connection refused"} == DB == 2024-11-22 07:42:34+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.36-1.el8 started. == DB == 2024-11-22 07:42:38+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' == DB == 2024-11-22 07:42:38+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.36-1.el8 started. 2024-11-22T07:42:39Z INFO MySQL wait for db service to be ready 2024-11-22T07:42:39Z INFO MySQL DB is not ready {"error": "dial tcp 127.0.0.1:3306: connect: connection refused"} == DB == '/var/lib/mysql/mysql.sock' -> '/var/run/mysqld/mysqld.sock' 2024-11-22T07:42:44Z INFO MySQL wait for db service to be ready 2024-11-22T07:42:44Z INFO MySQL DB is not ready {"error": "dial tcp 127.0.0.1:3306: connect: connection refused"} 2024-11-22T07:42:49Z INFO MySQL wait for db service to be ready 2024-11-22T07:42:49Z INFO MySQL DB is not ready {"error": "dial tcp 127.0.0.1:3306: connect: connection refused"} 2024-11-22T07:42:54Z INFO MySQL wait for db service to be ready 2024-11-22T07:42:54Z INFO MySQL DB is not ready {"error": "dial tcp 127.0.0.1:3306: connect: connection refused"} 2024-11-22T07:42:59Z INFO MySQL wait for db service to be ready 2024-11-22T07:43:00Z INFO MySQL Set semi-sync source timeout {"sql": "SET GLOBAL rpl_semi_sync_source_timeout = 10000;", "leader": "asmysql-lslyqu-mysql-0"} panic: start DB service failed: SET GLOBAL rpl_semi_sync_source_timeout = 10000; execute failed: dial tcp 10.244.1.64:3306: connect: connection refused

goroutine 16 [running]: github.com/apecloud/syncer/highavailability.(*Ha).Start(0xc000098780) /src/highavailability/ha.go:256 +0xad3 created by github.com/apecloud/syncer/highavailability.Start in goroutine 1 /src/highavailability/ha.go:69 +0x285



**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
 - OS: [e.g. iOS]
 - Browser [e.g. chrome, safari]
 - Version [e.g. 22]

**Additional context**
Add any other context about the problem here.