Open chriswiggins opened 1 year ago
Further to this, I've just tried this on a k3d cluster and can confirm that it doesn't take the kubernetes host into consideration at all - you can see that node 0 wasn't even selected for any scheduling, and the controller still places replicas onto the same kubernetes node
kubectl kc
POD NAME IP NODE ID ZONE USED MEMORY MAX MEMORY KEYS SLOTS
+ rediscluster-cluster-node-for-redis-jrx8h 10.42.1.7 172.28.0.4 1c44146f95f88cd9d95a36fd779a103d8bed54b1 unknown 20.60M 10.93G 10924-16383
| rediscluster-cluster-node-for-redis-vw4wv 10.42.1.6 172.28.0.4 837b1d87e8a949ff14aa2f414f71788b9eba4ed1 unknown 2.65M 10.93G
+ rediscluster-cluster-node-for-redis-ntp88 10.42.1.5 172.28.0.4 f8227cbc48e7d37062af0124bc823cbd42378b4d unknown 2.87M 10.93G 0-5461
| rediscluster-cluster-node-for-redis-g5pmn 10.42.2.5 172.28.0.5 330618a98938baa602a257f2aa8cec993b8bb78c unknown 2.69M 10.93G
+ rediscluster-cluster-node-for-redis-t785k 10.42.2.7 172.28.0.5 b01f0e768167a1e3ec2d1aa18c4b7881d92412b8 unknown 12.33M 10.93G 5462-10923
| rediscluster-cluster-node-for-redis-djgjf 10.42.2.6 172.28.0.5 1cdfbf25d99b43efef993f09378a4308a4330c07 unknown 2.67M 10.93G
NAME NAMESPACE PODS OPS STATUS REDIS STATUS NB PRIMARY REPLICATION ZONE SKEW
cluster-node-for-redis default 6/6/6 ClusterOK OK 3/3 1-1/1 0/0/BALANCED
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3d-redis-test-server-0 Ready control-plane,etcd,master 32m v1.24.4+k3s1 172.28.0.3 <none> K3s dev 5.15.49-linuxkit containerd://1.6.6-k3s1
k3d-redis-test-server-1 Ready control-plane,etcd,master 31m v1.24.4+k3s1 172.28.0.4 <none> K3s dev 5.15.49-linuxkit containerd://1.6.6-k3s1
k3d-redis-test-server-2 Ready control-plane,etcd,master 31m v1.24.4+k3s1 172.28.0.5 <none> K3s dev 5.15.49-linuxkit containerd://1.6.6-k3s1
I've got the repo cloned and will give fixing this a go so will follow up with any questions
@chriswiggins did you have the zoneAwareReplication
enabled? I believe it should try to distribute pods belonging to the same shard across nodes.
If that doesn't work, could you try adding the zone label with the same value to all your nodes?
example: topology.kubernetes.io/zone: my-test-zone
Hey @4n4nd - just tried setting that to no avail:
POD NAME IP NODE ID ZONE USED MEMORY MAX MEMORY KEYS SLOTS
+ rediscluster-cluster-node-for-redis-9z487 10.42.2.13 172.28.0.5 ae691a2b5cc0588d549052a0164d111f723933a2 my-test-zone 2.86M 10.93G 0-5461
| rediscluster-cluster-node-for-redis-vvchn 10.42.2.15 172.28.0.5 235d29b51b0eb4ef491255708ce6fe4a98adca7f my-test-zone 2.61M 10.93G
+ rediscluster-cluster-node-for-redis-d9cgd 10.42.1.13 172.28.0.4 8135ef94d11cdf8b240c02a2c31d1c58074e7cbd my-test-zone 18.10M 10.93G 5462-10923
| rediscluster-cluster-node-for-redis-mqjth 10.42.1.11 172.28.0.4 23902f9998ad0eed0ee9e68692900b361e13a0a3 my-test-zone 2.69M 10.93G
+ rediscluster-cluster-node-for-redis-tqgxz 10.42.1.12 172.28.0.4 3642beda98b780b0864d8e3514453a47c3beef9a my-test-zone 38.64M 10.93G 10924-16383
| rediscluster-cluster-node-for-redis-mts72 10.42.2.14 172.28.0.5 70745d1e2c50d8c25c5175b1028a630090bd7fcb my-test-zone 2.65M 10.93G
NAME NAMESPACE PODS OPS STATUS REDIS STATUS NB PRIMARY REPLICATION ZONE SKEW
cluster-node-for-redis default 6/6/6 ClusterOK OK 3/3 1-1/1 0/0/BALANCED
Anything else you can think of? zoneAwareReplication
is set to true as a default in the chart
@chriswiggins hmm this is weird. The pods are scheduled by k8s and not the operator and there are no pods scheduled in your 3rd worker node.
Very true - that was weird but still thats the scheduler doing its thing. I tried updating the affinity
in values.yaml
to the following, which has successfully spread the pods across all 3 nodes, however the operator isn't that clever and still assigns a replica to a master running on the same node:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: node-for-redis
topologyKey: kubernetes.io/hostname
POD NAME IP NODE ID ZONE USED MEMORY MAX MEMORY KEYS SLOTS
+ rediscluster-cluster-node-for-redis-59sx8 10.42.0.14 172.28.0.3 852b807caf83b53527ddbe31756790ca73717abf my-test-zone 11.83M 10.93G 5462-10923
| rediscluster-cluster-node-for-redis-5vwbq 10.42.1.24 172.28.0.4 7624a87e72ae194e0dcadeaa711055cd67135075 my-test-zone 2.69M 10.93G
+ rediscluster-cluster-node-for-redis-6s96c 10.42.2.23 172.28.0.5 ce846026759398bd33000f2593cf2f5f224454ee my-test-zone 2.87M 10.93G 0-5461
| rediscluster-cluster-node-for-redis-97nlg 10.42.2.24 172.28.0.5 fa2ca4131cfa48ecc0901b04b522d1516432e04d my-test-zone 2.63M 10.93G
+ rediscluster-cluster-node-for-redis-v52jb 10.42.1.25 172.28.0.4 211c502d16fee20b94f8f9e115f2fd36fa5547d2 my-test-zone 36.87M 10.93G 10924-16383
| rediscluster-cluster-node-for-redis-gfksb 10.42.0.13 172.28.0.3 8e7e6d40484fe1f8da5eea8e99b8d949b186679f my-test-zone 2.67M 10.93G
NAME NAMESPACE PODS OPS STATUS REDIS STATUS NB PRIMARY REPLICATION ZONE SKEW
cluster-node-for-redis default 6/6/6 ClusterOK OK 3/3 1-1/1 0/0/BALANCED
If I set a different zone name for each node, it all ends up balancing itself as expected:
+ rediscluster-cluster-node-for-redis-62zd4 10.42.1.26 172.28.0.4 ed81ca50b51f1e1910d8e07e7f6609ac1be48a0c my-test-zone-1 2.87M 10.93G 0-5461
| rediscluster-cluster-node-for-redis-zxk5z 10.42.2.26 172.28.0.5 85dd9ae90be024f3b4f4d0420b655148405ad2f8 my-test-zone-2 2.65M 10.93G
+ rediscluster-cluster-node-for-redis-gxj52 10.42.2.25 172.28.0.5 d717e3d81debd8f68b05cadf050829550e67245b my-test-zone-2 18.41M 10.93G 5462-10923
| rediscluster-cluster-node-for-redis-mfvk6 10.42.0.16 172.28.0.3 4b8f0187a92851e2aa31633756868b80f0b630dc my-test-zone-0 2.61M 10.93G
+ rediscluster-cluster-node-for-redis-sxcnk 10.42.0.15 172.28.0.3 5ed85e5d7570425c7702392e4374f7f7f58f0d8f my-test-zone-0 28.64M 10.93G 10924-16383
| rediscluster-cluster-node-for-redis-jphkb 10.42.1.27 172.28.0.4 10594407cbf6d0f9f8d69bf6f850ca361c11c2a3 my-test-zone-1 2.63M 10.93G
NAME NAMESPACE PODS OPS STATUS REDIS STATUS NB PRIMARY REPLICATION ZONE SKEW
cluster-node-for-redis default 6/6/6 ClusterOK OK 3/3 1-1/1 0/0/BALANCED
Based on this, the zoneAwareReplication
is definitely working, so realistically there are two options
1) (preferred) Create a PR to create a hostAwareReplication
key and then also check that when scheduling replicas
2) (backup) Assign nodes different zones - this could have other side affects that may not be desired
Hi there,
Just reading through the source and relevant issues in here to try and determine the node selection criteria when creating replicas. We run a 3-node Kubernetes cluster (with redis running outside the cluster currently) but are looking to move this into this operator.
From what I can gather, it looks like the replica placement is based on the zone topology key - what happens in a 3-node cluster, where there is no such thing as zones? Is the controller smart enough to not attach a replica to a primary on the same node? Obviously this would be undesired behaviour as a node going down would take the replica with it.
Happy to make a PR if pointed in the right direction!