apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.1k stars 170 forks source link

[BUG] redis cluster pod crash upgrade from 0.6.2 to 0.8 #6431

Closed JashBook closed 4 months ago

JashBook commented 9 months ago

Describe the bug redis cluster pod crash upgrade from 0.6.2 to 0.8.

To Reproduce Steps to reproduce the behavior:

  1. install kb 0.6.2
  2. create redis cluster
  3. upgrade from 0.6.2 to 0.7.2 to 0.8
  4. ops hscale out 4 --> stop --> start --> hscale in 2 --> reconfig
  5. See error
    
    kubectl get cluster
    NAME               CLUSTER-DEFINITION   VERSION           TERMINATION-POLICY   STATUS     AGE
    redis-upkb678      redis                redis-7.0.6       WipeOut              Abnormal   4h56m

kubectl get pod NAME READY STATUS RESTARTS AGE redis-upkb678-redis-0 2/3 CrashLoopBackOff 12 (3m53s ago) 42m redis-upkb678-redis-1 2/3 CrashLoopBackOff 12 (2m51s ago) 40m

describe cluster 

kubectl describe cluster redis-upkb678 Name: redis-upkb678 Namespace: default Labels: app.kubernetes.io/instance=redis-upkb678 clusterdefinition.kubeblocks.io/name=redis clusterversion.kubeblocks.io/name=redis-7.0.6 Annotations: kubeblocks.io/ops-request: [{"name":"redis-upkb678-reconfiguring-r5gt6","type":"Reconfiguring"}] kubeblocks.io/reconcile: 2024-01-11T11:24:47.736690603Z API Version: apps.kubeblocks.io/v1alpha1 Kind: Cluster Metadata: Creation Timestamp: 2024-01-11T06:29:02Z Finalizers: cluster.kubeblocks.io/finalizer Generation: 9 Managed Fields: API Version: apps.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:spec: .: f:affinity: .: f:podAntiAffinity: f:tenancy: f:clusterDefinitionRef: f:clusterVersionRef: f:componentSpecs: .: k:{"name":"redis"}: .: f:componentDefRef: f:name: f:noCreatePDB: f:resources: .: f:limits: .: f:cpu: f:memory: f:requests: .: f:cpu: f:memory: f:serviceAccountName: f:switchPolicy: .: f:type: k:{"name":"redis-sentinel"}: .: f:componentDefRef: f:name: f:noCreatePDB: f:resources: .: f:limits: .: f:cpu: f:memory: f:requests: .: f:cpu: f:memory: f:serviceAccountName: f:volumeClaimTemplates: f:terminationPolicy: Manager: kbcli_0.6.2 Operation: Update Time: 2024-01-11T06:29:02Z API Version: apps.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:labels: f:app.kubernetes.io/instance: f:spec: f:componentSpecs: k:{"name":"redis"}: f:monitor: k:{"name":"redis-sentinel"}: f:monitor: Manager: kbcli Operation: Update Time: 2024-01-11T10:33:25Z API Version: apps.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:spec: f:componentSpecs: k:{"name":"redis"}: f:replicas: Manager: kubectl-edit Operation: Update Time: 2024-01-11T10:57:26Z API Version: apps.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:clusterDefGeneration: f:components: .: f:redis: .: f:membersStatus: f:message: .: f:Pod/redis-upkb678-redis-0: f:Pod/redis-upkb678-redis-1: f:phase: f:podsReady: f:podsReadyTime: f:replicationSetStatus: .: f:primary: .: f:pod: f:secondaries: f:redis-sentinel: .: f:phase: f:podsReady: f:podsReadyTime: f:conditions: f:observedGeneration: f:phase: Manager: manager Operation: Update Subresource: status Time: 2024-01-11T10:57:29Z API Version: apps.kubeblocks.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:kubeblocks.io/ops-request: f:kubeblocks.io/reconcile: f:finalizers: .: v:"cluster.kubeblocks.io/finalizer": f:labels: .: f:clusterdefinition.kubeblocks.io/name: f:clusterversion.kubeblocks.io/name: f:spec: f:componentSpecs: k:{"name":"redis"}: f:volumeClaimTemplates: k:{"name":"redis-sentinel"}: f:replicas: f:monitor: f:resources: .: f:cpu: f:memory: f:storage: .: f:size: Manager: manager Operation: Update Time: 2024-01-11T11:24:47Z Resource Version: 214793820 UID: d506a601-1c14-45a8-a7e2-8a10f1c86614 Spec: Affinity: Pod Anti Affinity: Preferred Tenancy: SharedNode Cluster Definition Ref: redis Cluster Version Ref: redis-7.0.6 Component Specs: Component Def Ref: redis Monitor: true Name: redis No Create PDB: false Replicas: 2 Resources: Limits: Cpu: 100m Memory: 512Mi Requests: Cpu: 100m Memory: 512Mi Rsm Transform Policy: ToSts Service Account Name: kb-redis-upkb678 Switch Policy: Type: Noop Volume Claim Templates: Name: data Spec: Access Modes: ReadWriteOnce Resources: Requests: Storage: 4Gi Component Def Ref: redis-sentinel Monitor: true Name: redis-sentinel No Create PDB: false Replicas: 3 Resources: Limits: Cpu: 100m Memory: 512Mi Requests: Cpu: 100m Memory: 512Mi Rsm Transform Policy: ToSts Service Account Name: kb-redis-upkb678 Volume Claim Templates: Name: data Spec: Access Modes: ReadWriteOnce Resources: Requests: Storage: 1Gi Monitor: Resources: Cpu: 0 Memory: 0 Storage: Size: 0 Termination Policy: WipeOut Status: Cluster Def Generation: 4 Components: Redis: Members Status: Pod Name: redis-upkb678-redis-0 Role: Access Mode: ReadWrite Can Vote: true Is Leader: true Name: primary Pod Name: redis-upkb678-redis-1 Role: Access Mode: Readonly Can Vote: true Is Leader: false Name: secondary Message: Pod/redis-upkb678-redis-0: back-off 5m0s restarting failed container=redis pod=redis-upkb678-redis-0_default(75d063f9-e39c-4854-a14e-4f05c5b52aa5) Pod/redis-upkb678-redis-1: back-off 5m0s restarting failed container=redis pod=redis-upkb678-redis-1_default(7746e76c-95c5-4502-b417-27c0ffc4dd43) Phase: Failed Pods Ready: false Pods Ready Time: 2024-01-11T10:33:00Z Replication Set Status: Primary: Pod: redis-upkb678-redis-0 Secondaries: Pod: redis-upkb678-redis-1 Redis - Sentinel: Phase: Running Pods Ready: true Pods Ready Time: 2024-01-11T10:57:27Z Conditions: Last Transition Time: 2024-01-11T06:29:03Z Message: The operator has started the provisioning of Cluster: redis-upkb678 Observed Generation: 9 Reason: PreCheckSucceed Status: True Type: ProvisioningStarted Last Transition Time: 2024-01-11T06:39:58Z Message: Successfully applied for resources Observed Generation: 9 Reason: ApplyResourcesSucceed Status: True Type: ApplyResources Last Transition Time: 2024-01-11T10:47:28Z Message: pods are not ready in Components: [redis], refer to related component message in Cluster.status.components Reason: ReplicasNotReady Status: False Type: ReplicasReady Last Transition Time: 2024-01-11T10:47:28Z Message: pods are unavailable in Components: [redis], refer to related component message in Cluster.status.components Reason: ComponentsNotReady Status: False Type: Ready Observed Generation: 9 Phase: Abnormal Events: Type Reason Age From Message


Normal AllReplicasReady 54m (x6 over 115m) cluster-controller all pods of components are ready, waiting for the probe detection successful Normal Running 54m (x5 over 103m) cluster-controller Cluster: redis-upkb678 is ready, current phase is Running Normal ClusterReady 54m (x5 over 103m) cluster-controller Cluster: redis-upkb678 is ready, current phase is Running Normal ComponentPhaseTransition 54m (x10 over 115m) cluster-controller component is Running Normal ComponentPhaseTransition 53m (x9 over 104m) cluster-controller component is Updating Normal ComponentPhaseTransition 38m cluster-controller component is Failed Warning ComponentsNotReady 38m (x7 over 115m) cluster-controller pods are unavailable in Components: [redis], refer to related component message in Cluster.status.components Warning ReplicasNotReady 38m (x7 over 115m) cluster-controller pods are not ready in Components: [redis], refer to related component message in Cluster.status.components Normal HorizontalScale 33m (x3 over 102m) component-controller start horizontal scale component redis of cluster redis-upkb678 from 2 to 4 Normal PreCheckSucceed 28m (x8 over 104m) cluster-controller The operator has started the provisioning of Cluster: redis-upkb678 Warning Unhealthy 10m (x2 over 27m) event-controller Pod redis-upkb678-redis-sentinel-1: Readiness probe failed: Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. timeout: the monitored command dumped core Warning Unhealthy 10m (x2 over 23m) event-controller Pod redis-upkb678-redis-sentinel-2: Readiness probe errored: command "sh -c /scripts/redis-sentinel-ping.sh 1" timed out Warning BackOff 5m3s (x8 over 37m) event-controller Pod redis-upkb678-redis-1: Back-off restarting failed container redis in pod redis-upkb678-redis-1_default(7746e76c-95c5-4502-b417-27c0ffc4dd43) Warning Unhealthy 2m34s (x10 over 27m) event-controller Pod redis-upkb678-redis-sentinel-1: Readiness probe errored: command "sh -c /scripts/redis-sentinel-ping.sh 1" timed out Warning BackOff 62s (x10 over 38m) event-controller Pod redis-upkb678-redis-0: Back-off restarting failed container redis in pod redis-upkb678-redis-0_default(75d063f9-e39c-4854-a14e-4f05c5b52aa5)


logs pod

kubectl logs redis-upkb678-redis-0 redis

describe pod

 kubectl describe pod redis-upkb678-redis-0
Name:         redis-upkb678-redis-0
Namespace:    default
Priority:     0
Node:         gke-infracreate-gke-kbdata-e2-standar-765d90c7-9xqc/10.10.0.17
Start Time:   Thu, 11 Jan 2024 18:44:37 +0800
Labels:       app.kubernetes.io/component=redis
              app.kubernetes.io/instance=redis-upkb678
              app.kubernetes.io/managed-by=kubeblocks
              app.kubernetes.io/name=redis
              app.kubernetes.io/version=redis-7.0.6
              apps.kubeblocks.io/component-name=redis
              apps.kubeblocks.io/workload-type=Replication
              controller-revision-hash=redis-upkb678-redis-64486f6db9
              kubeblocks.io/role=secondary
              rsm.workloads.kubeblocks.io/access-mode=Readonly
              statefulset.kubernetes.io/pod-name=redis-upkb678-redis-0
Annotations:  apps.kubeblocks.io/component-replicas: 2
              apps.kubeblocks.io/last-role-snapshot-version: 2024-01-11T10:45:41.143964Z
              config.kubeblocks.io/restart-redis-replication-config: 58c8445448
              kubeblocks.io/restart: 2024-01-11T09:45:34Z
              rs.apps.kubeblocks.io/primary: redis-upkb678-redis-3
Status:       Running
IP:           10.128.25.66
IPs:
  IP:           10.128.25.66
Controlled By:  StatefulSet/redis-upkb678-redis
Containers:
  redis:
    Container ID:  containerd://d930318c232a2c4d0f1d9384c67f8b8adae706ffe269b9d1cad592096b9121a0
    Image:         infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/redis-stack-server:7.0.6-RC8
    Image ID:      infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/redis-stack-server@sha256:511808b267ab8d800283604ef5c01f4fe94792bfb746bb6dba236cc29ff5495b
    Port:          6379/TCP
    Host Port:     0/TCP
    Command:
      /scripts/redis-start.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 11 Jan 2024 19:27:57 +0800
      Finished:     Thu, 11 Jan 2024 19:27:57 +0800
    Ready:          False
    Restart Count:  13
    Limits:
      cpu:     100m
      memory:  512Mi
    Requests:
      cpu:      100m
      memory:   512Mi
    Readiness:  exec [sh -c /scripts/redis-ping.sh 1] delay=10s timeout=1s period=5s #success=1 #failure=5
    Environment Variables from:
      redis-upkb678-redis-env      ConfigMap  Optional: false
      redis-upkb678-redis-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:               redis-upkb678-redis-0 (v1:metadata.name)
      KB_POD_UID:                 (v1:metadata.uid)
      KB_NAMESPACE:              default (v1:metadata.namespace)
      KB_SA_NAME:                 (v1:spec.serviceAccountName)
      KB_NODENAME:                (v1:spec.nodeName)
      KB_HOST_IP:                 (v1:status.hostIP)
      KB_POD_IP:                  (v1:status.podIP)
      KB_POD_IPS:                 (v1:status.podIPs)
      KB_HOSTIP:                  (v1:status.hostIP)
      KB_PODIP:                   (v1:status.podIP)
      KB_PODIPS:                  (v1:status.podIPs)
      KB_CLUSTER_NAME:           redis-upkb678
      KB_COMP_NAME:              redis
      KB_CLUSTER_COMP_NAME:      redis-upkb678-redis
      KB_CLUSTER_UID_POSTFIX_8:  f1c86614
      KB_POD_FQDN:               $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc
      SERVICE_PORT:              6379
      REDIS_REPL_USER:           kbreplicator
      REDIS_REPL_PASSWORD:       <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
      REDIS_DEFAULT_USER:        <set to the key 'username' in secret 'redis-upkb678-conn-credential'>  Optional: false
      REDIS_DEFAULT_PASSWORD:    <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
      REDIS_SENTINEL_USER:       $(REDIS_REPL_USER)-sentinel
      REDIS_SENTINEL_PASSWORD:   <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
      REDIS_ARGS:                --requirepass $(REDIS_PASSWORD)
    Mounts:
      /data from data (rw)
      /etc/conf from redis-config (rw)
      /etc/redis from redis-conf (rw)
      /kb-podinfo from pod-info (rw)
      /scripts from scripts (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lsw7m (ro)
  metrics:
    Container ID:  containerd://923832c9b849fa3a511aab46bd9b10864d38cdb203a8e26ce9529914b68233b2
    Image:         infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/agamotto:0.1.2-beta.1
    Image ID:      infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/agamotto@sha256:cbab349b90490807a8d5039bf01bc7e37334f20c98c7dd75bc7fc4cf9e5b10ee
    Port:          9121/TCP
    Host Port:     0/TCP
    Command:
      /bin/agamotto
      --config=/opt/conf/metrics-config.yaml
    State:          Running
      Started:      Thu, 11 Jan 2024 18:44:39 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Environment Variables from:
      redis-upkb678-redis-env      ConfigMap  Optional: false
      redis-upkb678-redis-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:               redis-upkb678-redis-0 (v1:metadata.name)
      KB_POD_UID:                 (v1:metadata.uid)
      KB_NAMESPACE:              default (v1:metadata.namespace)
      KB_SA_NAME:                 (v1:spec.serviceAccountName)
      KB_NODENAME:                (v1:spec.nodeName)
      KB_HOST_IP:                 (v1:status.hostIP)
      KB_POD_IP:                  (v1:status.podIP)
      KB_POD_IPS:                 (v1:status.podIPs)
      KB_HOSTIP:                  (v1:status.hostIP)
      KB_PODIP:                   (v1:status.podIP)
      KB_PODIPS:                  (v1:status.podIPs)
      KB_CLUSTER_NAME:           redis-upkb678
      KB_COMP_NAME:              redis
      KB_CLUSTER_COMP_NAME:      redis-upkb678-redis
      KB_CLUSTER_UID_POSTFIX_8:  f1c86614
      KB_POD_FQDN:               $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc
      ENDPOINT:                  localhost:6379
      REDIS_USER:                <set to the key 'username' in secret 'redis-upkb678-conn-credential'>  Optional: false
      REDIS_PASSWORD:            <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
    Mounts:
      /opt/conf from redis-metrics-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lsw7m (ro)
  kb-checkrole:
    Container ID:  containerd://6025816a0915c1d5be2a1d7ed44df98f21c3a25653395946a0e10e6486e46aa7
    Image:         docker.io/apecloud/kubeblocks-tools:0.7.2
    Image ID:      docker.io/apecloud/kubeblocks-tools@sha256:4861c329f2d9b4d8bf1fb2c557838d2f21b788439b65c5aadc673cc5ee8c4f43
    Ports:         3501/TCP, 50001/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      lorry
      --port
      3501
      --grpcport
      50001
    State:          Running
      Started:      Thu, 11 Jan 2024 18:44:39 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:      0
      memory:   0
    Readiness:  exec [/bin/grpc_health_probe -addr=:50001] delay=0s timeout=1s period=2s #success=1 #failure=2
    Startup:    tcp-socket :3501 delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      redis-upkb678-redis-env      ConfigMap  Optional: false
      redis-upkb678-redis-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:                   redis-upkb678-redis-0 (v1:metadata.name)
      KB_POD_UID:                     (v1:metadata.uid)
      KB_NAMESPACE:                  default (v1:metadata.namespace)
      KB_SA_NAME:                     (v1:spec.serviceAccountName)
      KB_NODENAME:                    (v1:spec.nodeName)
      KB_HOST_IP:                     (v1:status.hostIP)
      KB_POD_IP:                      (v1:status.podIP)
      KB_POD_IPS:                     (v1:status.podIPs)
      KB_HOSTIP:                      (v1:status.hostIP)
      KB_PODIP:                       (v1:status.podIP)
      KB_PODIPS:                      (v1:status.podIPs)
      KB_CLUSTER_NAME:               redis-upkb678
      KB_COMP_NAME:                  redis
      KB_CLUSTER_COMP_NAME:          redis-upkb678-redis
      KB_CLUSTER_UID_POSTFIX_8:      f1c86614
      KB_POD_FQDN:                   $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc
      KB_SERVICE_USER:               <set to the key 'username' in secret 'redis-upkb678-conn-credential'>  Optional: false
      KB_SERVICE_PASSWORD:           <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
      KB_SERVICE_PORT:               6379
      KB_DATA_PATH:                  /data
      KB_SERVICE_CHARACTER_TYPE:     redis
      KB_WORKLOAD_TYPE:              Replication
      KB_SERVICE_USER:               <set to the key 'username' in secret 'redis-upkb678-conn-credential'>  Optional: false
      KB_SERVICE_PASSWORD:           <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
      KB_RSM_USERNAME:               <set to the key 'username' in secret 'redis-upkb678-conn-credential'>  Optional: false
      KB_RSM_PASSWORD:               <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
      KB_RSM_ACTION_SVC_LIST:        null
      KB_RSM_SERVICE_PORT:           6379
      KB_RSM_ROLE_UPDATE_MECHANISM:  DirectAPIServerEventUpdate
      KB_RSM_ROLE_PROBE_TIMEOUT:     1
    Mounts:
      /data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lsw7m (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-redis-upkb678-redis-0
    ReadOnly:   false
  pod-info:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels['kubeblocks.io/role'] -> pod-role
      metadata.annotations['rs.apps.kubeblocks.io/primary'] -> primary-pod
      metadata.annotations['apps.kubeblocks.io/component-replicas'] -> component-replicas
  redis-metrics-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      redis-upkb678-redis-redis-metrics-config
    Optional:  false
  redis-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      redis-upkb678-redis-redis-replication-config
    Optional:  false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      redis-upkb678-redis-redis-scripts
    Optional:  false
  redis-conf:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-lsw7m:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 kb-data=true:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  45m                 default-scheduler  Successfully assigned default/redis-upkb678-redis-0 to gke-infracreate-gke-kbdata-e2-standar-765d90c7-9xqc
  Normal   Started    45m                 kubelet            Started container metrics
  Normal   Started    45m                 kubelet            Started container kb-checkrole
  Normal   Created    45m                 kubelet            Created container kb-checkrole
  Normal   Pulled     45m                 kubelet            Container image "infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/agamotto:0.1.2-beta.1" already present on machine
  Normal   Created    45m                 kubelet            Created container metrics
  Normal   Pulled     45m                 kubelet            Container image "docker.io/apecloud/kubeblocks-tools:0.7.2" already present on machine
  Normal   checkRole  44m                 sqlchannel         {"event":"Failed","message":"role check delay","operation":"checkRole","originalRole":""}
  Normal   checkRole  44m                 sqlchannel         {"event":"Success","operation":"checkRole","originalRole":"","role":"secondary"}
  Normal   checkRole  43m                 sqlchannel         {"event":"Failed","message":"dial tcp 127.0.0.1:6379: connect: connection refused","operation":"checkRole","originalRole":"secondary"}
  Normal   Created    42m (x4 over 45m)   kubelet            Created container redis
  Normal   Pulled     42m (x4 over 45m)   kubelet            Container image "infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/redis-stack-server:7.0.6-RC8" already present on machine
  Normal   Started    42m (x4 over 45m)   kubelet            Started container redis
  Warning  BackOff    1s (x213 over 43m)  kubelet            Back-off restarting failed container redis in pod redis-upkb678-redis-0_default(75d063f9-e39c-4854-a14e-4f05c5b52aa5)
➜  ~ 
➜  ~ 
➜  ~ kubectl describe pod redis-upkb678-redis-1
Name:         redis-upkb678-redis-1
Namespace:    default
Priority:     0
Node:         gke-infracreate-gke-kbdata-e2-standar-765d90c7-8tgk/10.10.0.22
Start Time:   Thu, 11 Jan 2024 18:45:41 +0800
Labels:       app.kubernetes.io/component=redis
              app.kubernetes.io/instance=redis-upkb678
              app.kubernetes.io/managed-by=kubeblocks
              app.kubernetes.io/name=redis
              app.kubernetes.io/version=redis-7.0.6
              apps.kubeblocks.io/component-name=redis
              apps.kubeblocks.io/workload-type=Replication
              controller-revision-hash=redis-upkb678-redis-64486f6db9
              kubeblocks.io/role=secondary
              rsm.workloads.kubeblocks.io/access-mode=Readonly
              statefulset.kubernetes.io/pod-name=redis-upkb678-redis-1
Annotations:  apps.kubeblocks.io/component-replicas: 2
              apps.kubeblocks.io/last-role-snapshot-version: 2024-01-11T10:46:45.242306Z
              config.kubeblocks.io/restart-redis-replication-config: 58c8445448
              kubeblocks.io/restart: 2024-01-11T09:45:34Z
              rs.apps.kubeblocks.io/primary: redis-upkb678-redis-3
Status:       Running
IP:           10.128.5.149
IPs:
  IP:           10.128.5.149
Controlled By:  StatefulSet/redis-upkb678-redis
Containers:
  redis:
    Container ID:  containerd://a84de4d66cc0eff6cb407e77d6b4ebe326e9d05b5bc2f2a9e3c3cfd175e0ba28
    Image:         infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/redis-stack-server:7.0.6-RC8
    Image ID:      infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/redis-stack-server@sha256:511808b267ab8d800283604ef5c01f4fe94792bfb746bb6dba236cc29ff5495b
    Port:          6379/TCP
    Host Port:     0/TCP
    Command:
      /scripts/redis-start.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 11 Jan 2024 19:28:46 +0800
      Finished:     Thu, 11 Jan 2024 19:28:47 +0800
    Ready:          False
    Restart Count:  13
    Limits:
      cpu:     100m
      memory:  512Mi
    Requests:
      cpu:      100m
      memory:   512Mi
    Readiness:  exec [sh -c /scripts/redis-ping.sh 1] delay=10s timeout=1s period=5s #success=1 #failure=5
    Environment Variables from:
      redis-upkb678-redis-env      ConfigMap  Optional: false
      redis-upkb678-redis-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:               redis-upkb678-redis-1 (v1:metadata.name)
      KB_POD_UID:                 (v1:metadata.uid)
      KB_NAMESPACE:              default (v1:metadata.namespace)
      KB_SA_NAME:                 (v1:spec.serviceAccountName)
      KB_NODENAME:                (v1:spec.nodeName)
      KB_HOST_IP:                 (v1:status.hostIP)
      KB_POD_IP:                  (v1:status.podIP)
      KB_POD_IPS:                 (v1:status.podIPs)
      KB_HOSTIP:                  (v1:status.hostIP)
      KB_PODIP:                   (v1:status.podIP)
      KB_PODIPS:                  (v1:status.podIPs)
      KB_CLUSTER_NAME:           redis-upkb678
      KB_COMP_NAME:              redis
      KB_CLUSTER_COMP_NAME:      redis-upkb678-redis
      KB_CLUSTER_UID_POSTFIX_8:  f1c86614
      KB_POD_FQDN:               $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc
      SERVICE_PORT:              6379
      REDIS_REPL_USER:           kbreplicator
      REDIS_REPL_PASSWORD:       <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
      REDIS_DEFAULT_USER:        <set to the key 'username' in secret 'redis-upkb678-conn-credential'>  Optional: false
      REDIS_DEFAULT_PASSWORD:    <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
      REDIS_SENTINEL_USER:       $(REDIS_REPL_USER)-sentinel
      REDIS_SENTINEL_PASSWORD:   <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
      REDIS_ARGS:                --requirepass $(REDIS_PASSWORD)
    Mounts:
      /data from data (rw)
      /etc/conf from redis-config (rw)
      /etc/redis from redis-conf (rw)
      /kb-podinfo from pod-info (rw)
      /scripts from scripts (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pmqh2 (ro)
  metrics:
    Container ID:  containerd://24036e42ca140d266ede7e0da1736f34f565d943c1d6cea616dbe04d10e3042b
    Image:         infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/agamotto:0.1.2-beta.1
    Image ID:      infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/agamotto@sha256:cbab349b90490807a8d5039bf01bc7e37334f20c98c7dd75bc7fc4cf9e5b10ee
    Port:          9121/TCP
    Host Port:     0/TCP
    Command:
      /bin/agamotto
      --config=/opt/conf/metrics-config.yaml
    State:          Running
      Started:      Thu, 11 Jan 2024 18:45:44 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Environment Variables from:
      redis-upkb678-redis-env      ConfigMap  Optional: false
      redis-upkb678-redis-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:               redis-upkb678-redis-1 (v1:metadata.name)
      KB_POD_UID:                 (v1:metadata.uid)
      KB_NAMESPACE:              default (v1:metadata.namespace)
      KB_SA_NAME:                 (v1:spec.serviceAccountName)
      KB_NODENAME:                (v1:spec.nodeName)
      KB_HOST_IP:                 (v1:status.hostIP)
      KB_POD_IP:                  (v1:status.podIP)
      KB_POD_IPS:                 (v1:status.podIPs)
      KB_HOSTIP:                  (v1:status.hostIP)
      KB_PODIP:                   (v1:status.podIP)
      KB_PODIPS:                  (v1:status.podIPs)
      KB_CLUSTER_NAME:           redis-upkb678
      KB_COMP_NAME:              redis
      KB_CLUSTER_COMP_NAME:      redis-upkb678-redis
      KB_CLUSTER_UID_POSTFIX_8:  f1c86614
      KB_POD_FQDN:               $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc
      ENDPOINT:                  localhost:6379
      REDIS_USER:                <set to the key 'username' in secret 'redis-upkb678-conn-credential'>  Optional: false
      REDIS_PASSWORD:            <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
    Mounts:
      /opt/conf from redis-metrics-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pmqh2 (ro)
  kb-checkrole:
    Container ID:  containerd://9391c53b70a56a26415216756ec86eed0bba371e35b9f68f3553071f7a192c6b
    Image:         docker.io/apecloud/kubeblocks-tools:0.7.2
    Image ID:      docker.io/apecloud/kubeblocks-tools@sha256:4861c329f2d9b4d8bf1fb2c557838d2f21b788439b65c5aadc673cc5ee8c4f43
    Ports:         3501/TCP, 50001/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      lorry
      --port
      3501
      --grpcport
      50001
    State:          Running
      Started:      Thu, 11 Jan 2024 18:45:44 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:      0
      memory:   0
    Readiness:  exec [/bin/grpc_health_probe -addr=:50001] delay=0s timeout=1s period=2s #success=1 #failure=2
    Startup:    tcp-socket :3501 delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      redis-upkb678-redis-env      ConfigMap  Optional: false
      redis-upkb678-redis-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:                   redis-upkb678-redis-1 (v1:metadata.name)
      KB_POD_UID:                     (v1:metadata.uid)
      KB_NAMESPACE:                  default (v1:metadata.namespace)
      KB_SA_NAME:                     (v1:spec.serviceAccountName)
      KB_NODENAME:                    (v1:spec.nodeName)
      KB_HOST_IP:                     (v1:status.hostIP)
      KB_POD_IP:                      (v1:status.podIP)
      KB_POD_IPS:                     (v1:status.podIPs)
      KB_HOSTIP:                      (v1:status.hostIP)
      KB_PODIP:                       (v1:status.podIP)
      KB_PODIPS:                      (v1:status.podIPs)
      KB_CLUSTER_NAME:               redis-upkb678
      KB_COMP_NAME:                  redis
      KB_CLUSTER_COMP_NAME:          redis-upkb678-redis
      KB_CLUSTER_UID_POSTFIX_8:      f1c86614
      KB_POD_FQDN:                   $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc
      KB_SERVICE_USER:               <set to the key 'username' in secret 'redis-upkb678-conn-credential'>  Optional: false
      KB_SERVICE_PASSWORD:           <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
      KB_SERVICE_PORT:               6379
      KB_DATA_PATH:                  /data
      KB_SERVICE_CHARACTER_TYPE:     redis
      KB_WORKLOAD_TYPE:              Replication
      KB_SERVICE_USER:               <set to the key 'username' in secret 'redis-upkb678-conn-credential'>  Optional: false
      KB_SERVICE_PASSWORD:           <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
      KB_RSM_USERNAME:               <set to the key 'username' in secret 'redis-upkb678-conn-credential'>  Optional: false
      KB_RSM_PASSWORD:               <set to the key 'password' in secret 'redis-upkb678-conn-credential'>  Optional: false
      KB_RSM_ACTION_SVC_LIST:        null
      KB_RSM_SERVICE_PORT:           6379
      KB_RSM_ROLE_UPDATE_MECHANISM:  DirectAPIServerEventUpdate
      KB_RSM_ROLE_PROBE_TIMEOUT:     1
    Mounts:
      /data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pmqh2 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-redis-upkb678-redis-1
    ReadOnly:   false
  pod-info:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels['kubeblocks.io/role'] -> pod-role
      metadata.annotations['rs.apps.kubeblocks.io/primary'] -> primary-pod
      metadata.annotations['apps.kubeblocks.io/component-replicas'] -> component-replicas
  redis-metrics-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      redis-upkb678-redis-redis-metrics-config
    Optional:  false
  redis-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      redis-upkb678-redis-redis-replication-config
    Optional:  false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      redis-upkb678-redis-redis-scripts
    Optional:  false
  redis-conf:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-pmqh2:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 kb-data=true:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  44m                    default-scheduler  Successfully assigned default/redis-upkb678-redis-1 to gke-infracreate-gke-kbdata-e2-standar-765d90c7-8tgk
  Normal   Pulled     44m                    kubelet            Container image "infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/agamotto:0.1.2-beta.1" already present on machine
  Normal   Started    44m                    kubelet            Started container metrics
  Normal   Started    44m                    kubelet            Started container kb-checkrole
  Normal   Created    44m                    kubelet            Created container kb-checkrole
  Normal   Created    44m                    kubelet            Created container metrics
  Normal   Pulled     44m                    kubelet            Container image "docker.io/apecloud/kubeblocks-tools:0.7.2" already present on machine
  Normal   checkRole  43m                    sqlchannel         {"event":"Failed","message":"role check delay","operation":"checkRole","originalRole":""}
  Normal   checkRole  43m                    sqlchannel         {"event":"Success","operation":"checkRole","originalRole":"","role":"secondary"}
  Normal   checkRole  42m                    sqlchannel         {"event":"Failed","message":"dial tcp 127.0.0.1:6379: connect: connection refused","operation":"checkRole","originalRole":"secondary"}
  Normal   Created    41m (x4 over 44m)      kubelet            Created container redis
  Normal   Pulled     41m (x4 over 44m)      kubelet            Container image "infracreate-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/redis-stack-server:7.0.6-RC8" already present on machine
  Normal   Started    41m (x4 over 44m)      kubelet            Started container redis
  Warning  BackOff    3m59s (x190 over 42m)  kubelet            Back-off restarting failed container redis in pod redis-upkb678-redis-1_default(7746e76c-95c5-4502-b417-27c0ffc4dd43)
➜  ~ 

Expected behavior redis cluster ok.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

github-actions[bot] commented 8 months ago

This issue has been marked as stale because it has been open for 30 days with no activity