OT-CONTAINER-KIT / redis-operator

A golang based redis operator that will make/oversee Redis standalone/cluster/replication/sentinel mode setup on top of the Kubernetes.
https://ot-redis-operator.netlify.app/
Apache License 2.0
731 stars 207 forks source link

redis slaves listen on node IP, reported as down by sentinels #927

Open lukastopiarz opened 1 month ago

lukastopiarz commented 1 month ago

What version of redis operator are you using? v0.15.1

What operating system and processor architecture are you using (kubectl version)? v1.25.16

What did you do?

Trying to deploy 3 nodes HA setup of redis-replication with redis-sentinels.

redis-replication:

---
apiVersion: redis.redis.opstreelabs.in/v1beta2
kind: RedisReplication
metadata:
  name: redis-replication
spec:
  clusterSize: 3
  podSecurityContext:
    runAsUser: 1000
    fsGroup: 1000
  kubernetesConfig:
    image: registry.xyz.zone/external/quay.io-opstree-redis:v7.2.3
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 101m
        memory: 128Mi
      limits:
        cpu: 101m
        memory: 128Mi
  storage:
    volumeClaimTemplate:
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: ceph-rbd-sc
        resources:
          requests:
            storage: 1Gi

redis-sentinels:

apiVersion: redis.redis.opstreelabs.in/v1beta2
kind: RedisSentinel
metadata:
  name: redis
spec:
  clusterSize: 3
  podSecurityContext:
    runAsUser: 1000
    fsGroup: 1000
  pdb:
    enabled: true
    minAvailable: 1
  redisSentinelConfig:
    masterGroupName: redis
    redisReplicationName: redis-replication
    downAfterMilliseconds: "10000"
    failoverTimeout: "20000"
  kubernetesConfig:
    image: registry.xyz.zone/external/quay.io-opstree-redis-sentinel:v7.2.3
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 500m
        memory: 2Gi
      limits:
        memory: 2Gi

pods:

k get pod -o wide | grep redis
redis-replication-0                                               1/1     Running   0              19m     10.0.10.235   tools-k8s-drc-green-cz-prod-worker-rmx4p-k2vkh   <none>           <none>
redis-replication-1                                               1/1     Running   0              19m     10.0.0.25     tools-k8s-drc-green-cz-prod-worker-rmx4p-z9zvc   <none>           <none>
redis-replication-2                                               1/1     Running   0              19m     10.0.12.220   tools-k8s-drc-green-cz-prod-worker-rmx4p-7t26r   <none>           <none>
redis-sentinel-0                                                  1/1     Running   0              18m     10.0.10.107   tools-k8s-drc-green-cz-prod-worker-rmx4p-k2vkh   <none>           <none>
redis-sentinel-1                                                  1/1     Running   0              18m     10.0.12.6     tools-k8s-drc-green-cz-prod-worker-rmx4p-7t26r   <none>           <none>
redis-sentinel-2                                                  1/1     Running   0              18m     10.0.0.118    tools-k8s-drc-green-cz-prod-worker-rmx4p-z9zvc   <none>           <none>

info from master:

kubectl exec -it redis-replication-0 -- redis-cli
127.0.0.1:6379> role
1) "master"
2) (integer) 2489524
3) 1) 1) **"10.138.26.22"** <- SLAVES ON NODE IP INSTEAD OF POD IP?
      2) "6379"
      3) "2489524"
   2) 1) **"10.138.26.28"**  <- SLAVES ON NODE IP INSTEAD OF POD IP?
      2) "6379"
      3) "2489524"
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.138.26.22,port=6379,state=online,offset=2520250,lag=1
slave1:ip=10.138.26.28,port=6379,state=online,offset=2520250,lag=0
master_failover_state:no-failover
master_replid:dd1cedb046f34308d455d65ac45ec3144a560556
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:2522899
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1471708
repl_backlog_histlen:1051192

info from sentinel:

sentinel slaves redis
1)  1) "name"
    2) "10.138.26.28:6379"
    3) "ip"
    4) "10.138.26.28"
    5) "port"
    6) "6379"
    7) "runid"
    8) ""
    9) "flags"
   10) "s_down,slave,disconnected"
   11) "link-pending-commands"
   12) "3"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "1211357"
   17) "last-ok-ping-reply"
   18) "1211357"
   19) "last-ping-reply"
   20) "1211357"
   21) "s-down-time"
   22) "1201347"
   23) "down-after-milliseconds"
   24) "10000"
   25) "info-refresh"
   26) "0"
   27) "role-reported"
   28) "slave"
   29) "role-reported-time"
   30) "1211357"
   31) "master-link-down-time"
   32) "0"
   33) "master-link-status"
   34) "err"
   35) "master-host"
   36) "?"
   37) "master-port"
   38) "0"
   39) "slave-priority"
   40) "100"
   41) "slave-repl-offset"
   42) "0"
   43) "replica-announced"
   44) "1"
2)  1) "name"
    2) "10.138.26.22:6379"
    3) "ip"
    4) "10.138.26.22"
    5) "port"
    6) "6379"
    7) "runid"
    8) ""
    9) "flags"
   10) "s_down,slave,disconnected"
   11) "link-pending-commands"
   12) "3"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "1211360"
   17) "last-ok-ping-reply"
   18) "1211360"
   19) "last-ping-reply"
   20) "1211360"
   21) "s-down-time"
   22) "1201347"
   23) "down-after-milliseconds"
   24) "10000"
   25) "info-refresh"
   26) "0"
   27) "role-reported"
   28) "slave"
   29) "role-reported-time"
   30) "1211360"
   31) "master-link-down-time"
   32) "0"
   33) "master-link-status"
   34) "err"
   35) "master-host"
   36) "?"
   37) "master-port"
   38) "0"
   39) "slave-priority"
   40) "100"
   41) "slave-repl-offset"
   42) "0"
   43) "replica-announced"
   44) "1"

logs from sentinel:

k logs redis-sentinel-0
Sentinel is running without password which is not recommended
Running sentinel without TLS mode
ACL_MODE is not true, skipping ACL file modification
Starting  sentinel service .....
1:X 15 May 2024 12:57:47.830 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 15 May 2024 12:57:47.830 * Redis version=7.2.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 15 May 2024 12:57:47.830 * Configuration loaded
1:X 15 May 2024 12:57:47.830 * monotonic clock: POSIX clock_gettime
1:X 15 May 2024 12:57:47.831 # Failed to write PID file: Permission denied
1:X 15 May 2024 12:57:47.831 * Running mode=sentinel, port=26379.
1:X 15 May 2024 12:57:47.835 * Sentinel new configuration saved on disk
1:X 15 May 2024 12:57:47.835 * Sentinel ID is e305819f5261e47142197759d13bd1146218ef8a
1:X 15 May 2024 12:57:47.835 # +monitor master redis 10.0.10.235 6379 quorum 2
1:X 15 May 2024 12:57:47.835 * +slave slave 10.138.26.22:6379 10.138.26.22 6379 @ redis 10.0.10.235 6379
1:X 15 May 2024 12:57:47.838 * Sentinel new configuration saved on disk
1:X 15 May 2024 12:57:47.838 * +slave slave 10.138.26.28:6379 10.138.26.28 6379 @ redis 10.0.10.235 6379
1:X 15 May 2024 12:57:47.841 * Sentinel new configuration saved on disk
1:X 15 May 2024 12:57:48.637 * +sentinel sentinel 23ec30d68a30f766aad729a9e9ac436ec46ea91d 10.0.12.6 26379 @ redis 10.0.10.235 6379
1:X 15 May 2024 12:57:48.641 * Sentinel new configuration saved on disk
1:X 15 May 2024 12:57:49.804 * +sentinel sentinel 2ebf86da2f01b97f96428843c77ff424189b342c 10.0.0.118 26379 @ redis 10.0.10.235 6379
1:X 15 May 2024 12:57:49.808 * Sentinel new configuration saved on disk
1:X 15 May 2024 12:57:57.848 # +sdown slave 10.138.26.28:6379 10.138.26.28 6379 @ redis 10.0.10.235 6379
1:X 15 May 2024 12:57:57.848 # +sdown slave 10.138.26.22:6379 10.138.26.22 6379 @ redis 10.0.10.235 6379

worker nodes IPs:

tools-k8s-drc-green-cz-prod-worker-rmx4p-58jxc   Ready    <none>          5d3h   v1.25.16   10.138.26.27   10.138.26.27   Ubuntu 22.04.4 LTS   5.15.0-102-generic   containerd://1.6.21
tools-k8s-drc-green-cz-prod-worker-rmx4p-6v4s4   Ready    <none>          5d4h   v1.25.16   10.138.26.18   10.138.26.18   Ubuntu 22.04.4 LTS   5.15.0-102-generic   containerd://1.6.21
tools-k8s-drc-green-cz-prod-worker-rmx4p-7t26r   Ready    <none>          24h    v1.25.16   10.138.26.28   10.138.26.28   Ubuntu 22.04.4 LTS   5.15.0-102-generic   containerd://1.6.21
tools-k8s-drc-green-cz-prod-worker-rmx4p-8l8mw   Ready    <none>          5d3h   v1.25.16   10.138.26.21   10.138.26.21   Ubuntu 22.04.4 LTS   5.15.0-102-generic   containerd://1.6.21
tools-k8s-drc-green-cz-prod-worker-rmx4p-k2vkh   Ready    <none>          2d6h   v1.25.16   10.138.26.19   10.138.26.19   Ubuntu 22.04.4 LTS   5.15.0-102-generic   containerd://1.6.21
tools-k8s-drc-green-cz-prod-worker-rmx4p-kxvtm   Ready    <none>          24h    v1.25.16   10.138.26.30   10.138.26.30   Ubuntu 22.04.4 LTS   5.15.0-102-generic   containerd://1.6.21
tools-k8s-drc-green-cz-prod-worker-rmx4p-z9zvc   Ready    <none>          5d3h   v1.25.16   10.138.26.22   10.138.26.22   Ubuntu 22.04.4 LTS   5.15.0-102-generic   containerd://1.6.21
tools-k8s-drc-green-cz-prod-worker-rmx4p-zpl6m   Ready    <none>          24h    v1.25.16   10.138.26.20   10.138.26.20   Ubuntu 22.04.4 LTS   5.15.0-102-generic   containerd://1.6.21

What did you expect to see? I would expect sentinels to report that both slaves are connected, not down. I would also expect slaves to have bound POD IP, not node IP.

What did you see instead? Slaves are listening on node IP (10.138.26.x) instead of POD IP (10.0.x.y). I guess this makes trouble.