apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
1.78k stars 157 forks source link

[BUG] Data is lost after the primary and standby cluster is restarted #6535

Open JashBook opened 5 months ago

JashBook commented 5 months ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. create cluster
    apiVersion: apps.kubeblocks.io/v1alpha1
    kind: Cluster
    metadata:
    name: redis-nhdkgq
    namespace: default
    spec:
    clusterDefinitionRef: redis
    clusterVersionRef: redis-7.0.6
    terminationPolicy: WipeOut
    componentSpecs:
    - name: redis
      componentDef: redis
      replicas: 2
      resources:
        requests:
          cpu: 100m
          memory: 0.5Gi
        limits:
          cpu: 100m
          memory: 0.5Gi
      switchPolicy:
        type: Noop
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 1Gi
    - name: redis-sentinel
      componentDef: redis-sentinel
      replicas: 3
      resources:
        requests:
          cpu: 100m
          memory: 0.5Gi
        limits:
          cpu: 100m
          memory: 0.5Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 1Gi
      `kbcli cluster list-instances redis-nhdkgq --namespace default `

NAME                            NAMESPACE   CLUSTER        COMPONENT        STATUS    ROLE      ACCESSMODE   AZ       CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE    NODE                    CREATED-TIME                 
redis-nhdkgq-redis-0            default     redis-nhdkgq   redis            Running   primary   <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:01 UTC+0800   
redis-nhdkgq-redis-1            default     redis-nhdkgq   redis            Running   primary   <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:01 UTC+0800   
redis-nhdkgq-redis-sentinel-0   default     redis-nhdkgq   redis-sentinel   Running   <none>    <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:01 UTC+0800   
redis-nhdkgq-redis-sentinel-1   default     redis-nhdkgq   redis-sentinel   Running   <none>    <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:01 UTC+0800   
redis-nhdkgq-redis-sentinel-2   default     redis-nhdkgq   redis-sentinel   Running   <none>    <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:01 UTC+0800   

set data

      `echo "set mykey 'ndupatzcct'" | kbcli cluster connect redis-nhdkgq --namespace default `

Connect to instance redis-nhdkgq-redis-1: out of redis-nhdkgq-redis-1(primary), redis-nhdkgq-redis-0(primary)
Unable to use a TTY - input is not a terminal or the right kind of file
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
OK

test failover

      `kubectl delete pod redis-nhdkgq-redis-0  --namespace default `

pod "redis-nhdkgq-redis-0" deleted
failover pod name:redis-nhdkgq-redis-1
failover  Success

stop start ops

      `kbcli cluster stop redis-nhdkgq --auto-approve  --namespace default `

OpsRequest redis-nhdkgq-stop-hm79h created successfully, you can view the progress:
    kbcli cluster describe-ops redis-nhdkgq-stop-hm79h -n default
check cluster status

      `kbcli cluster list redis-nhdkgq --show-labels  --namespace default `

NAME           NAMESPACE   CLUSTER-DEFINITION   VERSION       TERMINATION-POLICY   STATUS    CREATED-TIME                 LABELS                                                                                                                             
redis-nhdkgq   default     redis                redis-7.0.6   WipeOut              Stopped   Jan 25,2024 15:01 UTC+0800   app.kubernetes.io/instance=redis-nhdkgq,clusterdefinition.kubeblocks.io/name=redis,clusterversion.kubeblocks.io/name=redis-7.0.6   
      `kbcli cluster start redis-nhdkgq --namespace default `

OpsRequest redis-nhdkgq-start-cd77h created successfully, you can view the progress:
    kbcli cluster describe-ops redis-nhdkgq-start-cd77h -n default
check cluster status

      `kbcli cluster list-instances redis-nhdkgq --namespace default `

NAME                            NAMESPACE   CLUSTER        COMPONENT        STATUS    ROLE      ACCESSMODE   AZ       CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE    NODE                    CREATED-TIME                 
redis-nhdkgq-redis-0            default     redis-nhdkgq   redis            Running   primary   <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:04 UTC+0800   
redis-nhdkgq-redis-1            default     redis-nhdkgq   redis            Running   primary   <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:04 UTC+0800   
redis-nhdkgq-redis-sentinel-0   default     redis-nhdkgq   redis-sentinel   Running   <none>    <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:04 UTC+0800   
redis-nhdkgq-redis-sentinel-1   default     redis-nhdkgq   redis-sentinel   Running   <none>    <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:04 UTC+0800   
redis-nhdkgq-redis-sentinel-2   default     redis-nhdkgq   redis-sentinel   Running   <none>    <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:04 UTC+0800   
check pod status done
check cluster connect
check cluster connect done
check ops status

hscale cluster

      `kbcli cluster hscale redis-nhdkgq --auto-approve --components redis --replicas 3 --namespace default `

OpsRequest redis-nhdkgq-horizontalscaling-2t465 created successfully, you can view the progress:
    kbcli cluster describe-ops redis-nhdkgq-horizontalscaling-2t465 -n default
check cluster status

NAME                            NAMESPACE   CLUSTER        COMPONENT        STATUS    ROLE        ACCESSMODE   AZ       CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE    NODE                    CREATED-TIME                 
redis-nhdkgq-redis-0            default     redis-nhdkgq   redis            Running   secondary   <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:04 UTC+0800   
redis-nhdkgq-redis-1            default     redis-nhdkgq   redis            Running   primary     <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:04 UTC+0800   
redis-nhdkgq-redis-2            default     redis-nhdkgq   redis            Running   secondary   <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:06 UTC+0800   
redis-nhdkgq-redis-sentinel-0   default     redis-nhdkgq   redis-sentinel   Running   <none>      <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:04 UTC+0800   
redis-nhdkgq-redis-sentinel-1   default     redis-nhdkgq   redis-sentinel   Running   <none>      <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:04 UTC+0800   
redis-nhdkgq-redis-sentinel-2   default     redis-nhdkgq   redis-sentinel   Running   <none>      <none>       <none>   100m / 100m          512Mi / 512Mi           data:1Gi   minikube/192.168.49.2   Jan 25,2024 15:04 UTC+0800   
check pod status done
check cluster connect
check cluster connect done
check ops status

get data nil

kbcli cluster connect redis-nhdkgq --namespace default 
Connect to instance redis-nhdkgq-redis-1: out of redis-nhdkgq-redis-1(primary), redis-nhdkgq-redis-0(secondary), redis-nhdkgq-redis-2(secondary)
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> get mykey
(nil)
127.0.0.1:6379> 

Expected behavior Data no lost after the primary and standby cluster is restarted.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

github-actions[bot] commented 4 months ago

This issue has been marked as stale because it has been open for 30 days with no activity

nayutah commented 4 weeks ago

“Connect to instance redis-nhdkgq-redis-1: out of redis-nhdkgq-redis-1(primary), redis-nhdkgq-redis-0(primary)” dual primary occurs, so it is a problem with switchover