apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.12k stars 175 forks source link

[BUG] starrocks ent shared-nothing cluster rebuild be instance: Backend node not found #7860

Open JashBook opened 3 months ago

JashBook commented 3 months ago

Describe the bug

kbcli version
Kubernetes: v1.29.6-gke.1038001
KubeBlocks: 0.9.1-beta.6
kbcli: 0.9.0

ERROR 1064 (HY000): Backend node not found. Check if any backend node is down.backend: [strsent-lxdilk-be-0.strsent-lxdilk-be-headless.default.svc.cluster.local alive: false inBlacklist: false] [strsent-lxdilk-be-1.strsent-lxdilk-be-headless.default.svc.cluster.local alive: true inBlacklist: false] [strsent-lxdilk-be-2.strsent-lxdilk-be-headless.default.svc.cluster.local alive: true inBlacklist: false]

To Reproduce Steps to reproduce the behavior:

  1. create cluster starrocks ent cluster shared-nothing
    apiVersion: apps.kubeblocks.io/v1alpha1
    kind: Cluster
    metadata:
    name: strsent-lxdilk
    namespace: default
    spec:
    terminationPolicy: Delete
    componentSpecs:
    - name: be
      componentDef: starrocks-be
      serviceAccountName: kb-strsent-lxdilk
      replicas: 2
      resources:
        requests:
          cpu: 3000m
          memory: 8Gi
        limits:
          cpu: 3000m
          memory: 8Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    - name: fe
      componentDef: starrocks-fe-sn
      serviceAccountName: kb-strsent-lxdilk
      replicas: 2
      resources:
        requests:
          cpu: 3000m
          memory: 8Gi
        limits:
          cpu: 3000m
          memory: 8Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
kbcli cluster list-instances strsent-lxdilk --namespace default 

NAME                  NAMESPACE   CLUSTER          COMPONENT   STATUS    ROLE     ACCESSMODE   AZ              CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE     NODE                                                             CREATED-TIME                 
strsent-lxdilk-be-0   default     strsent-lxdilk   be          Running   <none>   <none>       us-central1-f   3 / 3                8Gi / 8Gi               data:20Gi   gke-infracreate-gke-kbdata-e2-standar-25c8fd47-whii/10.10.0.31   Jul 23,2024 12:05 UTC+0800   
strsent-lxdilk-be-1   default     strsent-lxdilk   be          Running   <none>   <none>       us-central1-f   3 / 3                8Gi / 8Gi               data:20Gi   gke-infracreate-gke-kbdata-e2-standar-25c8fd47-whii/10.10.0.31   Jul 23,2024 12:05 UTC+0800   
strsent-lxdilk-fe-0   default     strsent-lxdilk   fe          Running   <none>   <none>       us-central1-f   3 / 3                8Gi / 8Gi               data:20Gi   gke-infracreate-gke-kbdata-4c8g-a94cd103-kjll/10.10.0.119        Jul 23,2024 12:05 UTC+0800   
strsent-lxdilk-fe-1   default     strsent-lxdilk   fe          Running   <none>   <none>       us-central1-a   3 / 3                8Gi / 8Gi               data:20Gi   gke-infracreate-gke-kbdata-e2-standar-765d90c7-r9z4/10.10.0.79   Jul 23,2024 12:05 UTC+0800   
  1. insert data
    
    kubectl exec -it strsent-lxdilk-fe-0 -c fe --namespace default bash

mysql -P9030 -hstrsent-lxdilk-fe-fe.default.svc -uroot -p'8ml3Sg3m97'

CREATE DATABASE IF NOT EXISTS mydb; use mydb; DROP TABLE IF EXISTS tmp_table; CREATE TABLE IF NOT EXISTS tmp_table (id INT, value STRING) PROPERTIES ( 'replication_num' = '1' ); INSERT INTO tmp_table (id, value) VALUES (1,'ljledwjjae');

3. rebuild instance be

apiVersion: apps.kubeblocks.io/v1alpha1 kind: OpsRequest metadata: generateName: strsent-lxdilk-rebuildinstance- namespace: default spec: type: RebuildInstance clusterRef: strsent-lxdilk force: true rebuildFrom:

4. See error

kubectl get cluster strsent-lxdilk NAME CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS AGE strsent-lxdilk Delete Running 16m

➜ ~ kubectl get pod -l app.kubernetes.io/instance=strsent-lxdilk NAME READY STATUS RESTARTS AGE strsent-lxdilk-be-1 3/3 Running 2 (14m ago) 17m strsent-lxdilk-be-2 3/3 Running 0 12m strsent-lxdilk-fe-0 3/3 Running 0 17m strsent-lxdilk-fe-1 3/3 Running 1 (14m ago) 17m

kubectl exec -it strsent-lxdilk-fe-0 -c fe --namespace default bash

mysql -P9030 -hstrsent-lxdilk-fe-fe.default.svc -uroot -p'8ml3Sg3m97'

use mydb;

SELECT value FROM tmp_table WHERE id = 1; ERROR 1064 (HY000): Backend node not found. Check if any backend node is down.backend: [strsent-lxdilk-be-0.strsent-lxdilk-be-headless.default.svc.cluster.local alive: false inBlacklist: false] [strsent-lxdilk-be-1.strsent-lxdilk-be-headless.default.svc.cluster.local alive: true inBlacklist: false] [strsent-lxdilk-be-2.strsent-lxdilk-be-headless.default.svc.cluster.local alive: true inBlacklist: false]



**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
 - OS: [e.g. iOS]
 - Browser [e.g. chrome, safari]
 - Version [e.g. 22]

**Additional context**
Add any other context about the problem here.
JashBook commented 3 months ago

rebuild instance inPlace: true ERROR 1064 (HY000): Build Exec OlapScanNode fail, scan info is invalid

  1. create cluster
    apiVersion: apps.kubeblocks.io/v1alpha1
    kind: Cluster
    metadata:
    name: strsent-drhbqv
    namespace: default
    spec:
    terminationPolicy: Delete
    componentSpecs:
    - name: be
      componentDef: starrocks-be
      serviceAccountName: kb-strsent-drhbqv
      replicas: 2
      resources:
        requests:
          cpu: 3000m
          memory: 8Gi
        limits:
          cpu: 3000m
          memory: 8Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    - name: fe
      componentDef: starrocks-fe-sn
      serviceAccountName: kb-strsent-drhbqv
      replicas: 2
      resources:
        requests:
          cpu: 3000m
          memory: 8Gi
        limits:
          cpu: 3000m
          memory: 8Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    
    kbcli cluster list-instances strsent-drhbqv --namespace default `

NAME NAMESPACE CLUSTER COMPONENT STATUS ROLE ACCESSMODE AZ CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE NODE CREATED-TIME
strsent-drhbqv-be-0 default strsent-drhbqv be Running us-central1-a 3 / 3 8Gi / 8Gi data:20Gi gke-infracreate-gke-kbdata-4c8g-b9b386a5-flt6/10.10.0.32 Jul 23,2024 12:52 UTC+0800
strsent-drhbqv-be-1 default strsent-drhbqv be Running us-central1-a 3 / 3 8Gi / 8Gi data:20Gi gke-infracreate-gke-kbdata-e2-standar-765d90c7-5fhz/10.10.0.124 Jul 23,2024 12:52 UTC+0800
strsent-drhbqv-fe-0 default strsent-drhbqv fe Running us-central1-b 3 / 3 8Gi / 8Gi data:20Gi gke-infracreate-gke-kbdata-4c8g-118373b4-6ddk/10.10.0.121 Jul 23,2024 12:52 UTC+0800
strsent-drhbqv-fe-1 default strsent-drhbqv fe Running us-central1-f 3 / 3 8Gi / 8Gi data:20Gi gke-infracreate-gke-kbdata-e2-standar-25c8fd47-whii/10.10.0.31 Jul 23,2024 12:52 UTC+0800

2. insert data

kubectl exec -it strsent-drhbqv-fe-0 -c fe --namespace default bash mysql -P9030 -hstrsent-drhbqv-fe-fe.default.svc -uroot -p'h55O8t1s9I' CREATE DATABASE IF NOT EXISTS mydb; use mydb; DROP TABLE IF EXISTS tmp_table; CREATE TABLE IF NOT EXISTS tmp_table (id INT, value STRING) PROPERTIES ( 'replication_num' = '1' ); INSERT INTO tmp_table (id, value) VALUES (1,'peyizhlofa');

3. rebuild instance

kind: OpsRequest metadata: generateName: strsent-drhbqv-rebuildinstance- namespace: default spec: type: RebuildInstance clusterRef: strsent-drhbqv force: true rebuildFrom:

➜ ~ kubectl get pod -l app.kubernetes.io/instance=strsent-drhbqv NAME READY STATUS RESTARTS AGE strsent-drhbqv-be-0 3/3 Running 0 8m4s strsent-drhbqv-be-1 3/3 Running 0 15m strsent-drhbqv-fe-0 3/3 Running 0 15m strsent-drhbqv-fe-1 3/3 Running 0 15m

kbcli cluster list-ops strsent-drhbqv --status all --namespace default `

NAME NAMESPACE TYPE CLUSTER COMPONENT STATUS PROGRESS CREATED-TIME
strsent-drhbqv-rebuildinstance-fgs9n default RebuildInstance strsent-drhbqv be Succeed 1/1 Jul 23,2024 13:00 UTC+0800

4. see error

kubectl exec -it strsent-drhbqv-fe-0 -c fe --namespace default bash

mysql -P9030 -hstrsent-drhbqv-fe-fe.default.svc -uroot -p'h55O8t1s9I'

use mydb;

mysql> SELECT value FROM tmp_table WHERE id = 1; ERROR 1064 (HY000): Build Exec OlapScanNode fail, scan info is invalid