apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.11k stars 174 forks source link

[BUG]Data lost when inject network delay fault to Redis cluster #5107

Closed ahjing99 closed 5 months ago

ahjing99 commented 1 year ago

kbcli version Kubernetes: v1.27.3-gke.100 KubeBlocks: 0.7.0-alpha.8 kbcli: 0.7.0-alpha.8

Steps:

  1. Inject network delay to leader pod
    
    `kbcli fault network delay --latency=15s -c=100 --jitter=0ms cluster-oqroov-redis-0 --ns-fault=default  --duration=2m`

NetworkChaos network-chaos-65g9m created

2. the original leader pod cluster-oqroov-redis-0 status is 4/5 ready and still can write, and the role is primary cause there are two primary pods

➜ ~ kbcli cluster connect cluster-oqroov Connect to instance cluster-oqroov-redis-0: out of cluster-oqroov-redis-0, cluster-oqroov-redis-1 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> get mykey "4" 127.0.0.1:6379> get mykey "5" 127.0.0.1:6379> get mykey "5" 127.0.0.1:6379> get mykey "7"

kbcli cluster describe cluster-oqroov Name: cluster-oqroov Created Time: Sep 13,2023 09:49 UTC+0800 NAMESPACE CLUSTER-DEFINITION VERSION STATUS TERMINATION-POLICY default redis redis-7.0.6 ConditionsError WipeOut

Endpoints: COMPONENT MODE INTERNAL EXTERNAL redis ReadWrite cluster-oqroov-redis.default.svc.cluster.local:6379 redis-sentinel ReadWrite cluster-oqroov-redis-sentinel.default.svc.cluster.local:26379

Topology: COMPONENT INSTANCE ROLE STATUS AZ NODE CREATED-TIME redis cluster-oqroov-redis-0 primary Running us-central1-c gke-yjtest-default-pool-47e27321-mvr4/10.128.15.201 Sep 13,2023 10:16 UTC+0800 redis cluster-oqroov-redis-1 primary Running us-central1-c gke-yjtest-default-pool-47e27321-rbkc/10.128.15.202 Sep 13,2023 09:49 UTC+0800 redis-sentinel cluster-oqroov-redis-sentinel-0 Running us-central1-c gke-yjtest-default-pool-47e27321-mvr4/10.128.15.201 Sep 13,2023 09:49 UTC+0800 redis-sentinel cluster-oqroov-redis-sentinel-1 Running us-central1-c gke-yjtest-default-pool-47e27321-h6tl/10.128.15.203 Sep 13,2023 09:50 UTC+0800 redis-sentinel cluster-oqroov-redis-sentinel-2 Running us-central1-c gke-yjtest-default-pool-47e27321-rbkc/10.128.15.202 Sep 13,2023 09:50 UTC+0800

Resources Allocation: COMPONENT DEDICATED CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE-SIZE STORAGE-CLASS redis false 500m / 500m 1Gi / 1Gi data:5Gi kb-default-sc redis-sentinel false 500m / 500m 1Gi / 1Gi data:5Gi kb-default-sc

Images: COMPONENT TYPE IMAGE redis redis registry.cn-hangzhou.aliyuncs.com/apecloud/redis-stack-server:7.0.6-RC8 redis-sentinel redis-sentinel registry.cn-hangzhou.aliyuncs.com/apecloud/redis-stack-server:7.0.6-RC8

Data Protection: AUTO-BACKUP BACKUP-SCHEDULE TYPE BACKUP-TTL LAST-SCHEDULE RECOVERABLE-TIME Disabled 7d

Show cluster events: kbcli cluster list-events -n default cluster-oqroov

3. After fault inject completed and  cluster-oqroov-redis-0 recover to 5/5 ready, the role changed to secondary

kbcli cluster describe cluster-oqroov Name: cluster-oqroov Created Time: Sep 13,2023 09:49 UTC+0800 NAMESPACE CLUSTER-DEFINITION VERSION STATUS TERMINATION-POLICY default redis redis-7.0.6 Running WipeOut

Endpoints: COMPONENT MODE INTERNAL EXTERNAL redis ReadWrite cluster-oqroov-redis.default.svc.cluster.local:6379 redis-sentinel ReadWrite cluster-oqroov-redis-sentinel.default.svc.cluster.local:26379

Topology: COMPONENT INSTANCE ROLE STATUS AZ NODE CREATED-TIME redis cluster-oqroov-redis-0 secondary Running us-central1-c gke-yjtest-default-pool-47e27321-mvr4/10.128.15.201 Sep 13,2023 10:16 UTC+0800 redis cluster-oqroov-redis-1 primary Running us-central1-c gke-yjtest-default-pool-47e27321-rbkc/10.128.15.202 Sep 13,2023 09:49 UTC+0800 redis-sentinel cluster-oqroov-redis-sentinel-0 Running us-central1-c gke-yjtest-default-pool-47e27321-mvr4/10.128.15.201 Sep 13,2023 09:49 UTC+0800 redis-sentinel cluster-oqroov-redis-sentinel-1 Running us-central1-c gke-yjtest-default-pool-47e27321-h6tl/10.128.15.203 Sep 13,2023 09:50 UTC+0800 redis-sentinel cluster-oqroov-redis-sentinel-2 Running us-central1-c gke-yjtest-default-pool-47e27321-rbkc/10.128.15.202 Sep 13,2023 09:50 UTC+0800

Resources Allocation: COMPONENT DEDICATED CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE-SIZE STORAGE-CLASS redis false 500m / 500m 1Gi / 1Gi data:5Gi kb-default-sc redis-sentinel false 500m / 500m 1Gi / 1Gi data:5Gi kb-default-sc

Images: COMPONENT TYPE IMAGE redis redis registry.cn-hangzhou.aliyuncs.com/apecloud/redis-stack-server:7.0.6-RC8 redis-sentinel redis-sentinel registry.cn-hangzhou.aliyuncs.com/apecloud/redis-stack-server:7.0.6-RC8

Data Protection: AUTO-BACKUP BACKUP-SCHEDULE TYPE BACKUP-TTL LAST-SCHEDULE RECOVERABLE-TIME Disabled 7d

Show cluster events: kbcli cluster list-events -n default cluster-oqroov

4. The data written during the dual primary role period lost

➜ ~ kbcli cluster connect cluster-oqroov Connect to instance cluster-oqroov-redis-0: out of cluster-oqroov-redis-0, cluster-oqroov-redis-1 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> get mykey "4" 127.0.0.1:6379> get mykey "5" 127.0.0.1:6379> get mykey "5" 127.0.0.1:6379> get mykey "7" 127.0.0.1:6379> get mykey Error: Server closed the connection not connected> get mykey "1" 127.0.0.1:6379>

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open for 30 days with no activity

nayutah commented 6 months ago

This issue cannot be fixed under the arch of sentinel + redis, when network fault is injected to master pod, both sentinel and redis slave cannot approach the master, so comes up with a network partition, sentinel detects the failure and promotes slave to new master, but some data are written successfully in the old master during the partition time, it is a common case in network partition. But the dual primary/master needs to be fixed ASAP.

nayutah commented 6 months ago

For dual primary/master, it can be fixed in the way like Patroni for PostgreSQL, sentinel always keeps the fresh and right info about the cluster, when a failover is done, a role change event can be emitted by sentinel to the lorry sidecar, and the message is passed to KB controller, finally, the partitioned primary pod label is rectified, and the services referring to the 'primary' label also come to a consistent state. During the dual primary phase, some writes from client routed to partitioned primary pod will fail and get reply with 'You can't write against a read only replica'.