apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
1.75k stars 154 forks source link

[BUG] starrocks ent fe pod always PodInitializing after hscale out fe and be then restart #7663

Open JashBook opened 5 days ago

JashBook commented 5 days ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. install starrocks ent addon
    helm upgrade --install --namespace kb-system kb-addon-starrocks kubeblocks-enterprise/starrocks --version 0.9.0
  2. create starrocks ent cluster
    apiVersion: apps.kubeblocks.io/v1alpha1
    kind: Cluster
    metadata:
    name: strsent-nerqht
    namespace: default
    spec:
    terminationPolicy: Delete
    componentSpecs:
    - name: be
      componentDef: starrocks-be
      replicas: 1
      resources:
        requests:
          cpu: 1000m
          memory: 1Gi
        limits:
          cpu: 1000m
          memory: 1Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    - name: fe
      componentDef: starrocks-fe-sn
      replicas: 1
      resources:
        requests:
          cpu: 1000m
          memory: 1Gi
        limits:
          cpu: 1000m
          memory: 1Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
  3. do some ops
    
    kbcli cluster hscale strsent-nerqht --auto-approve --force=true --components fe --replicas 2 --namespace default

kbcli cluster hscale strsent-nerqht --auto-approve --force=true --components be --replicas 2 --namespace default

kbcli cluster vscale strsent-nerqht --auto-approve --force=true --components fe --cpu 1100m --memory 2Gi --namespace default

4. See error

➜ ~ kubectl get pod -l app.kubernetes.io/instance=strsent-nerqht NAME READY STATUS RESTARTS AGE strsent-nerqht-be-0 3/3 Running 0 23m strsent-nerqht-be-1 3/3 Running 0 22m strsent-nerqht-fe-0 0/3 PodInitializing 0 20m strsent-nerqht-fe-1 3/3 Running 0 20m ➜ ~ ➜ ~ kubectl get ops -l app.kubernetes.io/instance=strsent-nerqht NAME TYPE CLUSTER STATUS PROGRESS AGE strsent-nerqht-verticalscaling-9vbgm VerticalScaling strsent-nerqht Running 1/2 20m ➜ ~ ➜ ~ ➜ ~ kubectl get cluster strsent-nerqht NAME CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS AGE strsent-nerqht Delete Updating 43m

describe cluster

kubectl describe cluster strsent-nerqht Name: strsent-nerqht Namespace: default Labels: app.kubernetes.io/instance=strsent-nerqht Annotations: kubeblocks.io/ops-request: [{"name":"strsent-nerqht-verticalscaling-9vbgm","type":"VerticalScaling"}] kubeblocks.io/reconcile: 2024-06-28T01:35:33.833177453Z API Version: apps.kubeblocks.io/v1alpha1 Kind: Cluster Metadata: Creation Timestamp: 2024-06-28T01:34:19Z Finalizers: cluster.kubeblocks.io/finalizer Generation: 12 Managed Fields: API Version: apps.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:kubectl.kubernetes.io/last-applied-configuration: f:spec: .: f:terminationPolicy: Manager: kubectl-client-side-apply Operation: Update Time: 2024-06-28T01:34:19Z API Version: apps.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:labels: .: f:app.kubernetes.io/instance: Manager: kbcli Operation: Update Time: 2024-06-28T01:36:43Z API Version: apps.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: f:kubeblocks.io/ops-request: f:kubeblocks.io/reconcile: f:finalizers: .: v:"cluster.kubeblocks.io/finalizer": f:spec: f:componentSpecs: f:resources: .: f:cpu: f:memory: f:services: f:storage: .: f:size: Manager: manager Operation: Update Time: 2024-06-28T01:56:12Z API Version: apps.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:components: .: f:be: .: f:phase: f:podsReady: f:podsReadyTime: f:fe: .: f:phase: f:podsReady: f:podsReadyTime: f:conditions: f:observedGeneration: f:phase: Manager: manager Operation: Update Subresource: status Time: 2024-06-28T01:56:14Z Resource Version: 410692079 UID: 9b4a3e93-5c2c-4d6a-8240-f6d742bd3e4c Spec: Component Specs: Component Def: starrocks-be Name: be Replicas: 2 Resources: Limits: Cpu: 1100m Memory: 2Gi Requests: Cpu: 1100m Memory: 2Gi Service Version: 3.2.2 Volume Claim Templates: Name: data Spec: Access Modes: ReadWriteOnce Resources: Requests: Storage: 20Gi Component Def: starrocks-fe-sn Name: fe Replicas: 2 Resources: Limits: Cpu: 1100m Memory: 2Gi Requests: Cpu: 1100m Memory: 2Gi Service Version: 3.2.2 Volume Claim Templates: Name: data Spec: Access Modes: ReadWriteOnce Resources: Requests: Storage: 24Gi Resources: Cpu: 0 Memory: 0 Services: Annotations: networking.gke.io/load-balancer-type: Internal Component Selector: fe Name: fe-vpc Service Name: fe-vpc Spec: Ports: Name: fe-http Node Port: 30243 Port: 8030 Protocol: TCP Target Port: http-port Name: fe-mysql Node Port: 30369 Port: 9030 Protocol: TCP Target Port: query-port Type: LoadBalancer Storage: Size: 0 Termination Policy: Delete Status: Components: Be: Phase: Running Pods Ready: true Pods Ready Time: 2024-06-28T01:56:14Z Fe: Phase: Updating Pods Ready: false Pods Ready Time: 2024-06-28T01:54:53Z Conditions: Last Transition Time: 2024-06-28T01:34:19Z Message: The operator has started the provisioning of Cluster: strsent-nerqht Observed Generation: 12 Reason: PreCheckSucceed Status: True Type: ProvisioningStarted Last Transition Time: 2024-06-28T01:38:05Z Message: Successfully applied for resources Observed Generation: 12 Reason: ApplyResourcesSucceed Status: True Type: ApplyResources Last Transition Time: 2024-06-28T01:56:13Z Message: pods are not ready in Components: [fe], refer to related component message in Cluster.status.components Reason: ReplicasNotReady Status: False Type: ReplicasReady Last Transition Time: 2024-06-28T01:56:13Z Message: pods are unavailable in Components: [fe], refer to related component message in Cluster.status.components Reason: ComponentsNotReady Status: False Type: Ready Observed Generation: 12 Phase: Updating Events: Type Reason Age From Message


Normal ComponentPhaseTransition 47m (x2 over 47m) cluster-controller component is Creating Warning Unhealthy 45m (x5 over 46m) event-controller Pod strsent-nerqht-be-0: Startup probe failed: Get "http://10.128.2.63:8040/api/health": dial tcp 10.128.2.63:8040: connect: connection refused Normal AllReplicasReady 45m cluster-controller all pods of components are ready, waiting for the probe detection successful Normal ClusterReady 45m cluster-controller Cluster: strsent-nerqht is ready, current phase is Running Normal Running 45m cluster-controller Cluster: strsent-nerqht is ready, current phase is Running Warning ComponentsNotReady 43m (x2 over 45m) cluster-controller pods are unavailable in Components: [be], refer to related component message in Cluster.status.components Warning ReplicasNotReady 43m (x2 over 45m) cluster-controller pods are not ready in Components: [be], refer to related component message in Cluster.status.components Normal ApplyResourcesSucceed 43m (x6 over 47m) cluster-controller Successfully applied for resources Warning ReplicasNotReady 43m cluster-controller pods are not ready in Components: [be fe], refer to related component message in Cluster.status.components Normal ComponentPhaseTransition 43m (x2 over 43m) cluster-controller component is Updating Normal HorizontalScale 41m (x2 over 41m) component-controller start horizontal scale component fe of cluster strsent-nerqht from 1 to 2 Normal HorizontalScale 33m component-controller start horizontal scale component fe of cluster strsent-nerqht from 2 to 0 Normal HorizontalScale 33m component-controller start horizontal scale component be of cluster strsent-nerqht from 1 to 0 Normal HorizontalScale 32m component-controller start horizontal scale component fe of cluster strsent-nerqht from 0 to 2 Normal HorizontalScale 32m component-controller start horizontal scale component be of cluster strsent-nerqht from 0 to 1 Normal ComponentPhaseTransition 32m (x12 over 45m) cluster-controller component is Running Normal PreCheckSucceed 26m (x11 over 47m) cluster-controller The operator has started the provisioning of Cluster: strsent-nerqht Normal HorizontalScale 26m (x2 over 26m) component-controller start horizontal scale component be of cluster strsent-nerqht from 1 to 2


describe pod-0 fe

kubectl describe pod strsent-nerqht-fe-0 Name: strsent-nerqht-fe-0 Namespace: default Priority: 0 Node: gke-infracreate-gke-kbdata-e2-standar-25c8fd47-9yic/10.10.0.70 Start Time: Fri, 28 Jun 2024 09:56:57 +0800 Labels: app.kubernetes.io/component=starrocks-fe-sn app.kubernetes.io/instance=strsent-nerqht app.kubernetes.io/managed-by=kubeblocks app.kubernetes.io/name=starrocks-fe-sn app.kubernetes.io/version=starrocks-fe-sn apps.kubeblocks.io/cluster-uid=9b4a3e93-5c2c-4d6a-8240-f6d742bd3e4c apps.kubeblocks.io/component-name=fe apps.kubeblocks.io/pod-name=strsent-nerqht-fe-0 componentdefinition.kubeblocks.io/name=starrocks-fe-sn controller-revision-hash=6bc67dbc6c workloads.kubeblocks.io/instance=strsent-nerqht-fe workloads.kubeblocks.io/managed-by=InstanceSet Annotations: apps.kubeblocks.io/component-replicas: 2 kubeblocks.io/restart: 2024-06-28T01:50:32Z Status: Pending IP: 10.128.2.114 IPs: IP: 10.128.2.114 Controlled By: InstanceSet/strsent-nerqht-fe Init Containers: init-lorry: Container ID: containerd://8826873260aa831e8f604d768454b96fb08f28bc16c85bff916afc13dd365130 Image: docker.io/apecloud/kubeblocks-tools:0.9.0-beta.39 Image ID: docker.io/apecloud/kubeblocks-tools@sha256:5c137c9ae94ef615be726bbd35df0a31217a3701b1c64e5773321b88e287afa8 Port: Host Port: Command: cp -r /bin/lorry /config /kubeblocks/ State: Terminated Reason: Completed Exit Code: 0 Started: Fri, 28 Jun 2024 09:56:59 +0800 Finished: Fri, 28 Jun 2024 09:57:01 +0800 Ready: True Restart Count: 0 Limits: cpu: 0 memory: 0 Requests: cpu: 0 memory: 0 Environment Variables from: strsent-nerqht-fe-env ConfigMap Optional: false Environment: STARROCKS_USER: <set to the key 'username' in secret 'strsent-nerqht-fe-account-root'> Optional: false STARROCKS_PASSWORD: <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'> Optional: false MYSQL_PWD: <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'> Optional: false KB_POD_NAME: strsent-nerqht-fe-0 (v1:metadata.name) KB_POD_UID: (v1:metadata.uid) KB_NAMESPACE: default (v1:metadata.namespace) KB_SA_NAME: (v1:spec.serviceAccountName) KB_NODENAME: (v1:spec.nodeName) KB_HOST_IP: (v1:status.hostIP) KB_POD_IP: (v1:status.podIP) KB_POD_IPS: (v1:status.podIPs) KB_HOSTIP: (v1:status.hostIP) KB_PODIP: (v1:status.podIP) KB_PODIPS: (v1:status.podIPs) KB_POD_FQDN: $(KB_POD_NAME).strsent-nerqht-fe-headless.$(KB_NAMESPACE).svc Mounts: /kubeblocks from kubeblocks (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jljd4 (ro) starrocks-tools: Container ID: containerd://5882df76793f3385db1ad3625455e5199cb8b92e6d128d75d0b1f5e68e938a7c Image: docker.io/apecloud/starrocks-tools:3.2.2 Image ID: docker.io/apecloud/starrocks-tools@sha256:fd9b4e989932b172368cdd1de986845ea96c0d5c19efd4c7fe3bea11bd7aa0f5 Port: Host Port: Command: cp /bin/mysql /kb_tools/mysql State: Terminated Reason: Completed Exit Code: 0 Started: Fri, 28 Jun 2024 09:57:03 +0800 Finished: Fri, 28 Jun 2024 09:57:04 +0800 Ready: True Restart Count: 0 Limits: cpu: 0 memory: 0 Requests: cpu: 0 memory: 0 Environment Variables from: strsent-nerqht-fe-env ConfigMap Optional: false Environment: STARROCKS_USER: <set to the key 'username' in secret 'strsent-nerqht-fe-account-root'> Optional: false STARROCKS_PASSWORD: <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'> Optional: false MYSQL_PWD: <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'> Optional: false KB_POD_NAME: strsent-nerqht-fe-0 (v1:metadata.name) KB_POD_UID: (v1:metadata.uid) KB_NAMESPACE: default (v1:metadata.namespace) KB_SA_NAME: (v1:spec.serviceAccountName) KB_NODENAME: (v1:spec.nodeName) KB_HOST_IP: (v1:status.hostIP) KB_POD_IP: (v1:status.podIP) KB_POD_IPS: (v1:status.podIPs) KB_HOSTIP: (v1:status.hostIP) KB_PODIP: (v1:status.podIP) KB_PODIPS: (v1:status.podIPs) KB_POD_FQDN: $(KB_POD_NAME).strsent-nerqht-fe-headless.$(KB_NAMESPACE).svc TOOLS_SCRIPTS_PATH: /opt/kb-tools/reload/fe-cm Mounts: /kb_tools from kb-tools (rw) /opt/config-manager from config-manager-config (rw) /opt/kb-tools/reload/fe-cm from cm-script-fe-cm (rw) /opt/starrocks/fe/conf from fe-cm (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jljd4 (ro) Containers: fe: Container ID:
Image: docker.io/starrocks/fe-ubuntu:3.2.2 Image ID:
Ports: 8030/TCP, 9020/TCP, 9030/TCP, 9010/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP Command: bash -c /opt/starrocks/fe_entrypoint.sh ${FE_DISCOVERY_SERVICE_NAME}

State:          Waiting
  Reason:       PodInitializing
Ready:          False
Restart Count:  0
Limits:
  cpu:     1100m
  memory:  2Gi
Requests:
  cpu:      1100m
  memory:   2Gi
Liveness:   http-get http://:8030/api/health delay=0s timeout=1s period=5s #success=1 #failure=3
Readiness:  http-get http://:8030/api/health delay=0s timeout=1s period=5s #success=1 #failure=3
Startup:    http-get http://:8030/api/health delay=0s timeout=1s period=5s #success=1 #failure=60
Environment Variables from:
  strsent-nerqht-fe-env      ConfigMap  Optional: false
  strsent-nerqht-fe-rsm-env  ConfigMap  Optional: false
Environment:
  STARROCKS_USER:        <set to the key 'username' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
  STARROCKS_PASSWORD:    <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
  MYSQL_PWD:             <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
  KB_POD_NAME:           strsent-nerqht-fe-0 (v1:metadata.name)
  KB_POD_UID:             (v1:metadata.uid)
  KB_NAMESPACE:          default (v1:metadata.namespace)
  KB_SA_NAME:             (v1:spec.serviceAccountName)
  KB_NODENAME:            (v1:spec.nodeName)
  KB_HOST_IP:             (v1:status.hostIP)
  KB_POD_IP:              (v1:status.podIP)
  KB_POD_IPS:             (v1:status.podIPs)
  KB_HOSTIP:              (v1:status.hostIP)
  KB_PODIP:               (v1:status.podIP)
  KB_PODIPS:              (v1:status.podIPs)
  KB_POD_FQDN:           $(KB_POD_NAME).strsent-nerqht-fe-headless.$(KB_NAMESPACE).svc
  TZ:                    Asia/Shanghai
  POD_NAME:              strsent-nerqht-fe-0 (v1:metadata.name)
  POD_IP:                 (v1:status.podIP)
  HOST_IP:                (v1:status.hostIP)
  POD_NAMESPACE:         default (v1:metadata.namespace)
  HOST_TYPE:             FQDN
  COMPONENT_NAME:        fe
  CONFIGMAP_MOUNT_PATH:  /etc/starrocks/fe/conf
  SERVICE_PORT:          8030
Mounts:
  /kb_tools from kb-tools (rw)
  /opt/starrocks/fe/conf from fe-cm (rw)
  /opt/starrocks/fe/log from log (rw)
  /opt/starrocks/fe/meta from data (rw)
  /scripts from scripts (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jljd4 (ro)
lorry: Container ID:
Image: docker.io/starrocks/fe-ubuntu:3.2.2 Image ID:
Ports: 3501/TCP, 50001/TCP Host Ports: 0/TCP, 0/TCP Command: /kubeblocks/lorry --port 3501 --grpcport 50001 --config-path /kubeblocks/config/lorry/components/ State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Limits: cpu: 0 memory: 0 Requests: cpu: 0 memory: 0 Startup: tcp-socket :3501 delay=0s timeout=1s period=10s #success=1 #failure=3 Environment Variables from: strsent-nerqht-fe-env ConfigMap Optional: false strsent-nerqht-fe-rsm-env ConfigMap Optional: false Environment: STARROCKS_USER: <set to the key 'username' in secret 'strsent-nerqht-fe-account-root'> Optional: false STARROCKS_PASSWORD: <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'> Optional: false MYSQL_PWD: <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'> Optional: false KB_POD_NAME: strsent-nerqht-fe-0 (v1:metadata.name) KB_POD_UID: (v1:metadata.uid) KB_NAMESPACE: default (v1:metadata.namespace) KB_SA_NAME: (v1:spec.serviceAccountName) KB_NODENAME: (v1:spec.nodeName) KB_HOST_IP: (v1:status.hostIP) KB_POD_IP: (v1:status.podIP) KB_POD_IPS: (v1:status.podIPs) KB_HOSTIP: (v1:status.hostIP) KB_PODIP: (v1:status.podIP) KB_PODIPS: (v1:status.podIPs) KB_POD_FQDN: $(KB_POD_NAME).strsent-nerqht-fe-headless.$(KB_NAMESPACE).svc KB_BUILTIN_HANDLER: custom KB_SERVICE_USER: <set to the key 'username' in secret 'strsent-nerqht-fe-account-root'> Optional: false KB_SERVICE_PASSWORD: <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'> Optional: false KB_SERVICE_PORT: 8030 KB_DATA_PATH: /opt/starrocks/fe/meta KB_ACTION_COMMANDS: {"memberLeave":["/bin/bash","-c","#!/usr/bin/env bash\n\nset -x\nset -o errexit\n\nleader_host=\"\"\nleave_member_host=\"\"\nleave_member_port=\"\"\nhelper_endpoints=\"\"\ncandidate_names=\"\"\n\nfunction info() {\n echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $*\"\n}\n\n# root@x-fe-0:/opt/starrocks# mysql -h 127.0.0.1 -P 9030 -e \"show frontends\"\n# +-------------------------------------------------------------------------------+------------------------------------------------------------+-------------+----------+-----------+---------+----------+------------+------+-------+-------------------+---------------------+----------+--------+---------------------+---------------+\n#
Name IP EditLogPort HttpPort QueryPort RpcPort Role ClusterId Join Alive ReplayedJournalId LastHeartbeat IsHelper ErrMsg StartTime Version \n# +-------------------------------------------------------------------------------+------------------------------------------------------------+-------------+----------+-----------+---------+----------+------------+------+-------+-------------------+---------------------+----------+--------+---------------------+---------------+\n# x-fe-1.x-fe-headless.kubeblocks-cloud-ns.svc.cluster.local_9010_1717662978660 x-fe-1.x-fe-headless.kubeblocks-cloud-ns.svc.cluster.local 9010 8030 9030 9020 FOLLOWER 1847720530 true true 179 2024-06-06 16:42:30 true 2024-06-06 16:36:30 3.2.2-269e832 \n# x-fe-0.x-fe-headless.kubeblocks-cloud-ns.svc.cluster.local_9010_1717662806744 x-fe-0.x-fe-headless.kubeblocks-cloud-ns.svc.cluster.local 9010 8030 9030 9020 LEADER 1847720530 true true 180 2024-06-06 16:42:30 true 2024-06-06 16:33:47 3.2.2-269e832 \n# x-fe-2.x-fe-headless.kubeblocks-cloud-ns.svc.cluster.local_9010_1717662978644 x-fe-2.x-fe-headless.kubeblocks-cloud-ns.svc.cluster.local 9010 8030 9030 9020 FOLLOWER 1847720530 true true 179 2024-06-06 16:42:30 true 2024-06-06 16:36:41 3.2.2-269e832 \n# +-------------------------------------------------------------------------------+------------------------------------------------------------+-------------+----------+-----------+---------+----------+------------+------+-------+-------------------+---------------------+----------+--------+---------------------+---------------+\nfunction show_frontends() {\n mysql -N -B -h 127.0.0.1 -P 9030 -e \"show frontends\"\n}\n\nfunction switch_leader() {\n java -jar /opt/starrocks/fe/lib/starrocks-bdb-je*.jar DbGroupAdmin -helperHosts \"${helper_endpoints}\" -groupName PALO_JOURNAL_GROUP -transferMaster -force \"${candidate_names}\" 5000\n}\n\nfunction wait_for_leader_switched() {\n until [[ $(show_frontends grep 'LEADER' awk '{print $2}') != ${KB_LEAVE_MEMBER_POD_NAME}* ]]; do\n sleep 5\n info \"waiting for leader to be switched\"\n done\n}\n\n# execute a mysql command and iterate the output line by line\noutput=$(show_frontends)\nwhile IFS= read -r line; do\n name=$(echo \"$line\" awk '{print $1}')\n ip=$(echo \"$line\" awk '{print $2}')\n edit_log_port=$(echo \"$line\" awk '{print $3}')\n role=$(echo \"$line\" awk '{print $7}')\n is_leaving=False\n if [[ ${ip} == ${KB_LEAVE_MEMBER_POD_NAME} ]]; then\n is_leaving=True\n leave_member_host=${ip}\n leave_member_port=${edit_log_port}\n fi\n if [ \"${role}\" == \"LEADER\" ]; then\n leader_host=${ip}\n fi\n if [ \"${is_leaving}\" == \"False\" ]; then\n if [ -n \"${helper_endpoints}\" ]; then\n helper_endpoints=${helper_endpoints},${ip}:${edit_log_port}\n candidate_names=${candidate_names},${name}\n else\n helper_endpoints=${ip}:${edit_log_port}\n candidate_names=${name}\n fi\n fi\ndone \u003c\u003c\u003c \"$output\"\n\ninfo \"leave member: ${leave_member_host}:${leave_member_port}\"\ninfo \"leader: ${leader_host}\"\ninfo \"helper hosts: ${helper_endpoints}\"\ninfo \"candidate hosts: ${candidate_names}\"\n\n# The leader will exit if lost it's leader role\nif [[ ${leader_host} == ${KB_LEAVE_MEMBER_POD_NAME} ]]; then\n switch_leader\n wait_for_leader_switched\nfi\n\nmysql -h \"${leader_host}\" -P 9030 -e \"alter system drop follower '${leave_member_host}:${leave_member_port}';\"\n"]} TZ: Asia/Shanghai POD_NAME: strsent-nerqht-fe-0 (v1:metadata.name) POD_IP: (v1:status.podIP) HOST_IP: (v1:status.hostIP) POD_NAMESPACE: default (v1:metadata.namespace) HOST_TYPE: FQDN COMPONENT_NAME: fe CONFIGMAP_MOUNT_PATH: /etc/starrocks/fe/conf SERVICE_PORT: 8030 Mounts: /kubeblocks from kubeblocks (rw) /opt/starrocks/fe/meta from data (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jljd4 (ro) config-manager: Container ID:
Image: docker.io/apecloud/kubeblocks-tools:0.9.0-beta.39 Image ID:
Port: 9901/TCP Host Port: 0/TCP Command: env Args: PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$(TOOLS_PATH) /bin/reloader --log-level info --operator-update-enable --tcp 9901 --config /opt/config-manager/config-manager.yaml State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Limits: cpu: 0 memory: 0 Requests: cpu: 0 memory: 0 Environment Variables from: strsent-nerqht-fe-env ConfigMap Optional: false strsent-nerqht-fe-rsm-env ConfigMap Optional: false Environment: STARROCKS_USER: <set to the key 'username' in secret 'strsent-nerqht-fe-account-root'> Optional: false STARROCKS_PASSWORD: <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'> Optional: false MYSQL_PWD: <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'> Optional: false KB_POD_NAME: strsent-nerqht-fe-0 (v1:metadata.name) KB_POD_UID: (v1:metadata.uid) KB_NAMESPACE: default (v1:metadata.namespace) KB_SA_NAME: (v1:spec.serviceAccountName) KB_NODENAME: (v1:spec.nodeName) KB_HOST_IP: (v1:status.hostIP) KB_POD_IP: (v1:status.podIP) KB_POD_IPS: (v1:status.podIPs) KB_HOSTIP: (v1:status.hostIP) KB_PODIP: (v1:status.podIP) KB_PODIPS: (v1:status.podIPs) KB_POD_FQDN: $(KB_POD_NAME).strsent-nerqht-fe-headless.$(KB_NAMESPACE).svc CONFIG_MANAGER_POD_IP: (v1:status.podIP) TOOLS_PATH: /opt/kb-tools/reload/fe-cm:/opt/config-manager:/kb_tools Mounts: /kb_tools from kb-tools (rw) /opt/config-manager from config-manager-config (rw) /opt/kb-tools/reload/fe-cm from cm-script-fe-cm (rw) /opt/starrocks/fe/conf from fe-cm (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jljd4 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: log: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: fe-cm: Type: ConfigMap (a volume populated by a ConfigMap) Name: strsent-nerqht-fe-fe-cm Optional: false scripts: Type: ConfigMap (a volume populated by a ConfigMap) Name: strsent-nerqht-fe-scripts Optional: false cm-script-fe-cm: Type: ConfigMap (a volume populated by a ConfigMap) Name: sidecar-starrocks-scripts-strsent-nerqht Optional: false config-manager-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: sidecar-strsent-nerqht-fe-config-manager-config Optional: false kb-tools: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: data-strsent-nerqht-fe-0 ReadOnly: false kubeblocks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: kube-api-access-jljd4: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: Tolerations: kb-data=true:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message

Normal Scheduled 21m default-scheduler Successfully assigned default/strsent-nerqht-fe-0 to gke-infracreate-gke-kbdata-e2-standar-25c8fd47-9yic Normal Pulled 21m kubelet Container image "docker.io/apecloud/kubeblocks-tools:0.9.0-beta.39" already present on machine Normal Created 21m kubelet Created container init-lorry Normal Started 21m kubelet Started container init-lorry Normal Pulled 20m kubelet Container image "docker.io/apecloud/starrocks-tools:3.2.2" already present on machine Normal Created 20m kubelet Created container starrocks-tools Normal Started 20m kubelet Started container starrocks-tools Normal Pulled 20m kubelet Container image "docker.io/starrocks/fe-ubuntu:3.2.2" already present on machine Normal Created 20m kubelet Created container fe Normal Started 20m kubelet Started container fe



**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
 - OS: [e.g. iOS]
 - Browser [e.g. chrome, safari]
 - Version [e.g. 22]

**Additional context**
Add any other context about the problem here.
iziang commented 4 days ago

The FE pod has a post-start hook script used to set the root account password. There is an SQL command in the script that is getting stuck: mysql --connect-timeout=1 -h127.0.0.1 -uroot -P9030 -px xxxxxxxx -e show databases. img_v3_02c9_b6a1599b-e320-4f50-95a8-340301c0304g

Attempting to establish a new connection using the MySQL client also gets stuck. img_v3_02c9_31fd0796-8222-4d6f-a561-f2d8bde51a5g

The fe-1 pod is functioning normally, and using the MySQL client to connect and execute the SQL command show frontends shows that both FEs are operating normally.

img_v3_02c9_ac475a85-c1f4-467e-a155-6ed280ce059g

The log of fe-0: fe.log

The stack of fe-0: stack.log

The gc stat of fe-0: img_v3_02c9_fd581ebf-374b-4354-8450-e5c245086d0g

The jvm flags of fe-0:

root@strsent-nerqht-fe-0:/opt/starrocks# jcmd 10 VM.flags
10:
-XX:-AlwaysTenure -XX:CICompilerCount=2 -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:-CMSParallelRemarkEnabled -XX:ConcGCThreads=1 -XX:G1ConcRefinementThreads=2 -XX:G1HeapRegionSize=2097152 -XX:GCDrainStackTargetSize=64 -XX:InitialHeapSize=33554432 -XX:MarkStackSize=4194304 -XX:MaxHeapSize=8589934592 -XX:MaxNewSize=5152702464 -XX:MaxTenuringThreshold=7 -XX:MinHeapDeltaBytes=2097152 -XX:-NeverTenure -XX:NonNMethodCodeHeapSize=5825164 -XX:NonProfiledCodeHeapSize=122916538 -XX:ProfiledCodeHeapSize=122916538 -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=8 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC

The fe.conf of fe-0: img_v3_02c9_81fff07e-997c-4543-a7cf-11b347f50aeg