OT-CONTAINER-KIT / redis-operator

A golang based redis operator that will make/oversee Redis standalone/cluster/replication/sentinel mode setup on top of the Kubernetes.
https://ot-redis-operator.netlify.app/
Apache License 2.0
734 stars 207 forks source link

Scaling redis replication setup does not work #836

Open wgnathanael opened 3 months ago

wgnathanael commented 3 months ago

What version of redis operator are you using? v0.15.0 (but also latest main branch where we modified a few lines and log statements for our own instruction/understanding)

kubectl logs <_redis-operator_pod_name> -n <namespace>
{"level":"info","ts":"2024-03-20T21:55:30Z","logger":"controllers.RedisReplication","msg":"LeaderInfo","Request.Namespace":"sentientq","Request.Name":"redis-replication","Number of leaders":"1"}
{"level":"info","ts":"2024-03-20T21:55:30Z","logger":"controllers.RedisReplication","msg":"We have matching number of leaders","Request.Namespace":"sentientq","Request.Name":"redis-replication"}
{"level":"info","ts":"2024-03-20T21:55:35Z","logger":"controllers.RedisReplication","msg":"Reconciling opstree redis replication controller","Request.Namespace":"sentientq","Request.Name":"redis-replication"}
{"level":"info","ts":"2024-03-20T21:55:35Z","logger":"controllers.RedisReplication","msg":"Redis replication nodes are not ready yet","Request.Namespace":"sentientq","Request.Name":"redis-replication","Ready.Replicas":"2","Expected.Replicas":3}
{"level":"info","ts":"2024-03-20T21:55:40Z","logger":"controllers.RedisReplication","msg":"Reconciling opstree redis replication controller","Request.Namespace":"sentientq","Request.Name":"redis-replication"}
{"level":"info","ts":"2024-03-20T21:55:40Z","logger":"controllers.RedisReplication","msg":"Redis replication nodes are not ready yet","Request.Namespace":"sentientq","Request.Name":"redis-replication","Ready.Replicas":"2","Expected.Replicas":3}
{"level":"info","ts":"2024-03-20T21:56:40Z","logger":"controllers.RedisReplication","msg":"Reconciling opstree redis replication controller","Request.Namespace":"sentientq","Request.Name":"redis-replication"}
{"level":"info","ts":"2024-03-20T21:56:40Z","logger":"controllers.RedisReplication","msg":"LeaderInfo","Request.Namespace":"sentientq","Request.Name":"redis-replication","Number of leaders":"2"}
{"level":"info","ts":"2024-03-20T21:56:40Z","logger":"controllers.RedisReplication","msg":"Creating redis replication by executing replication creation commands","Request.Namespace":"sentientq","Request.Name":"redis-replication","Replication.Ready":"3"}
{"level":"info","ts":"2024-03-20T21:56:42Z","logger":"controllers.RedisReplication","msg":"FollowerInfo","Request.Namespace":"sentientq","Request.Name":"redis-replication","Number of followers":"1"}
{"level":"error","ts":"2024-03-20T21:56:43Z","logger":"controllers.RedisReplication","msg":"Error in getting Redis pod IP","namespace":"sentientq","podName":"","error":"resource name may not be empty","stacktrace":"github.com/OT-CONTAINER-KIT/redis-operator/k8sutils.getRedisServerIP\n\t/workspace/k8sutils/redis.go:34\ngithub.com/OT-CONTAINER-KIT/redis-operator/k8sutils.CreateMasterSlaveReplication\n\t/workspace/k8sutils/redis.go:574\ngithub.com/OT-CONTAINER-KIT/redis-operator/controllers.(*RedisReplicationReconciler).Reconcile\n\t/workspace/controllers/redisreplication_controller.go:87\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":"2024-03-20T21:56:54Z","logger":"controllers.RedisReplication","msg":"Reconciling opstree redis replication controller","Request.Namespace":"sentientq","Request.Name":"redis-replication"}
{"level":"info","ts":"2024-03-20T21:56:55Z","logger":"controllers.RedisReplication","msg":"LeaderInfo","Request.Namespace":"sentientq","Request.Name":"redis-replication","Number of leaders":"0"}
{"level":"info","ts":"2024-03-20T21:56:55Z","logger":"controllers.RedisReplication","msg":"We have matching number of leaders","Request.Namespace":"sentientq","Request.Name":"redis-replication"}

redis-operator version: v0.15.1

Does this issue reproduce with the latest release? Yes

What operating system and processor architecture are you using (kubectl version)? RHEL 9.2 x86_64

kubectl version Output
$ kubectl version
Client Version: v1.29.1+rke2r1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.1+rke2r1

What did you do?

Starting with a simple replication.yaml taken from the example directory. Configured TLS + redisPassword and a couple of volumes. clusterSize = 2.

Allowed it to come up and the controller to assign one as leader and one as follower. Our script to check this outputs:

============ Pod 0 ============
role:master
============ Pod 1 ============
role:slave
master_host:10.42.2.72

Then we edit the RedisReplica (though we've also tested changed the replica set that the controller has created with identical results).

The additional pod comes up, The controller notices and initiates the code to create leader/followers. Once its complete we have the following output:

============ Pod 0 ============
role:slave
master_host:
============ Pod 1 ============
role:slave
master_host:10.42.2.72
============ Pod 2 ============
role:slave
master_host:

The script outputting that is doing a redis-cli info replication as such: run_redis_cmd $n "info replication" | grep -e 'role:' -e 'master_host:' -e 'slave[0-9]:'

What did you expect to see?

I would expect

============ Pod 0 ============
role:master
============ Pod 1 ============
role:slave
master_host:10.42.2.72
============ Pod 2 ============
role:slave
master_host:10.42.2.72

What did you see instead?

All three pods are slave instead of leader as per:

============ Pod 0 ============
role:slave
master_host:
============ Pod 1 ============
role:slave
master_host:10.42.2.72
============ Pod 2 ============
role:slave
master_host:

Additional Details Because we changed the logging to debug this here's diff of our changes thus far.

diff --git a/controllers/redisreplication_controller.go b/controllers/redisreplication_controller.go
index d6f9b85c..6e473636 100644
--- a/controllers/redisreplication_controller.go
+++ b/controllers/redisreplication_controller.go
@@ -77,16 +77,23 @@ func (r *RedisReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Req
        return ctrl.Result{RequeueAfter: time.Second * 60}, nil
    }

-   if len(k8sutils.GetRedisNodesByRole(ctx, r.K8sClient, r.Log, instance, "master")) > int(leaderReplicas) {
+   masterNodes := k8sutils.GetRedisNodesByRole(ctx, r.K8sClient, r.Log, instance, "master")
+   reqLogger.Info("LeaderInfo", "Number of leaders", strconv.Itoa(len(masterNodes)))
+
+   if len(masterNodes) > int(leaderReplicas) {
        reqLogger.Info("Creating redis replication by executing replication creation commands", "Replication.Ready", strconv.Itoa(int(redisReplicationInfo.Status.ReadyReplicas)))
-       masterNodes := k8sutils.GetRedisNodesByRole(ctx, r.K8sClient, r.Log, instance, "master")
        slaveNodes := k8sutils.GetRedisNodesByRole(ctx, r.K8sClient, r.Log, instance, "slave")
+       reqLogger.Info("FollowerInfo", "Number of followers", strconv.Itoa(len(slaveNodes)))
        err := k8sutils.CreateMasterSlaveReplication(ctx, r.K8sClient, r.Log, instance, masterNodes, slaveNodes)
        if err != nil {
+           reqLogger.Info("Unable to create master/slave replication - Will reconcile redis operator in again 60 seconds")
            return ctrl.Result{RequeueAfter: time.Second * 60}, err
        }
+   } else {
+       reqLogger.Info("We have matching number of leaders")
    }
-   reqLogger.Info("Will reconcile redis operator in again 10 seconds")
+
+   // reqLogger.Info("Will reconcile redis operator in again 10 seconds")
    return ctrl.Result{RequeueAfter: time.Second * 10}, nil
 }
drivebyer commented 3 months ago

@wgnathanael , Thank you for your feedback.

Based on your comments, there appears to be a misunderstanding concerning the system behavior. As per your description, when the number of replicas is increased by one, the newly initiated pod assumes the role of a slave. However, my understanding is that the default behavior when starting Redis is for the new instance to act as a master. Only in this, the standalone master will be add to the replication topology during the "CreateMasterSlaveReplication" process.

wgnathanael commented 3 months ago

Hello, Thanks for the quick response.

Yeah, its fine that a new Redis instance starts out as a master. The issue is once the operator controller runs, they all become slaves to nothing. So before the scale, I have 1 master and 1 slave. When I increase the number of replicas, I end up with 3 slaves, 2 of whom don't know who the master is and one that still does but the instance it connects to thinks its a replica.

Essentially, what I think the problem is is the CreateMasterSlaveReplication process doesn't have a way of determining which of the two newest masters was the original master and reconfigures them incorrectly. I'm still investigating this but was hoping that someone here would see the behaviour and have a sense of how that code is getting it wrong.

wgnathanael commented 3 months ago

Just after I sent the above I continued looking at the code. I think the issue is checkAttachedSlave. Not so much the function but some other initial configuration issue. It is looking for the previous existing master. It does this by looking at all master/leader pods and finding the one with multiple connected replicas. This must be failing, and its true I remember in our output missing the connected replica output so they are setup as replicas but can't communicate with their leader. I'll continue to investigate the root cause.

In looking at the logs for the initial replica node I see the following:

9:S 21 Mar 2024 13:59:56.091 * MASTER <-> REPLICA sync started
9:S 21 Mar 2024 13:59:56.091 * Non blocking connect for SYNC fired the event.
9:S 21 Mar 2024 13:59:56.093 # Failed to read response from the server: No error information
9:S 21 Mar 2024 13:59:56.093 # Master did not respond to command during SYNC handshake

It must be having issues communicating with the leader node.

wgnathanael commented 3 months ago

This is the issue https://github.com/OT-CONTAINER-KIT/redis/pull/75. When I add tls-replication yes to my additional redis configuration it works. The follower can communicate with the leader.

It seems to me that however unlikely it is, it may be worthwhile in the method CreateMasterSlaveReplication some logic that if we have multiple leaders and multiple follower pods and can't find the "realMasterPod" that we do nothing and log an error. Since this situation only happens if the replicas haven't been able to connect to a master. We can also find the real leader by querying all the followers and making sure they all point to the same "real leader"

drivebyer commented 3 months ago

Hey @wgnathanael! Could you kindly share the Redis replication YAMLs that include TLS configuration? I'd love to use them for reproducing purposes. Thank you so much!

wgnathanael commented 3 months ago

Sure. The configmap:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-external-config
data:
  redis-additional.conf: |
    tls-replication yes
    loadmodule /usr/local/lib/libOmsDriver.so /usr/local/lib/rules/
---
apiVersion: v1
kind: Secret
metadata:
  name: scrs-redis-replication-tls
data:
  # I've included both tls.key and peer-key.pem as it seems how the operator reads this is different, 
  # It seems like the TLS key in the replication manifest used to define what key to use here, 
  # but with the latest operator it seems it just expects tls.key/tls.crt etc.
  peer-key.pem: <base64 encoded data>
  peer.pem: <base64 encoded data>
  root-ca.pem: <base64 encoded data>
  tls.key: <base64 encoded data>
  tls.crt: <base64 encoded data>
  ca.crt: <base64 encoded data>
---
apiVersion: v1
kind: Secret
metadata:
  name: scrs-redis-secret
data:
  redisPassword: M2FiNGM4NzA=

Replication:

---
apiVersion: redis.redis.opstreelabs.in/v1beta2
kind: RedisReplication
metadata:
  name: redis-replication
spec:
  clusterSize: 2
  TLS:
    ca: root-ca.pem
    cert: peer.pem
    key: peer-key.pem
    secret:
      secretName: scrs-redis-replication-tls
  redisConfig:
    additionalRedisConfig: redis-external-config

  kubernetesConfig:
    image: registry.willowglen.ca/sq/sq-scadacom/scadacom-runtime-service/wg-redis:gnat-test-v3
    imagePullPolicy: IfNotPresent
    redisSecret:
      name: scrs-redis-secret
      key: redisPassword

  storage:
    volumeClaimTemplate:
      spec:
        # storageClassName: standard
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 1Gi
  podSecurityContext:
    runAsUser: 1000
    fsGroup: 1000

I've removed some mounts and the sidecar we include as they don't really have anything to do with the issue.

The image (which is from a private registry so you won't be able to pull it has the following Dockerfile contents

ARG BASE_BUILD_CONTAINER_VER=latest
ARG BASE_IMAGE=quay.io/opstree/redis:v7.0.12

FROM registry.willowglen.ca/sq/base-containers/base-build-container:${BASE_BUILD_CONTAINER_VER} as builder

FROM $BASE_IMAGE as base

ARG BASE_IMAGE
ARG OCI_LABEL_PREFIX=org.opencontainers

# update default values accordingly for the image 
ARG LABEL_PREFIX=sq.sq-scadacom.scadacom-runtime-service.wg-redis

# passed in as build-arg, with default values if not built by gitlab
ARG IMAGE_CREATED
ARG IMAGE_VERSION
ARG IMAGE_REVISION
ARG CODE_DATE
ARG CODE_BRANCH
ARG BUILD_ENV="dev-vm"

LABEL $OCI_LABEL_PREFIX.image.created="$IMAGE_CREATED"
LABEL $OCI_LABEL_PREFIX.image.version="$IMAGE_VERSION"
LABEL $OCI_LABEL_PREFIX.image.revision="$IMAGE_REVISION"
LABEL $OCI_LABEL_PREFIX.image.base.name="$BASE_IMAGE"
LABEL $OCI_LABEL_PREFIX.image.title="Willowglen Redis"
LABEL $OCI_LABEL_PREFIX.image.description="This is the wg-redis image"

LABEL $LABEL_PREFIX.image.created="$IMAGE_CREATED"
LABEL $LABEL_PREFIX.image.url="$PROJECT_URL"
LABEL $LABEL_PREFIX.image.version="$IMAGE_VERSION"
LABEL $LABEL_PREFIX.image.revision="$IMAGE_REVISION"
LABEL $LABEL_PREFIX.image.base.name="$BASE_IMAGE"
LABEL $LABEL_PREFIX.image.code.date="$CODE_DATE"
LABEL $LABEL_PREFIX.image.code.branch="$CODE_BRANCH"
LABEL $LABEL_PREFIX.image.build-env="$BUILD_ENV"

USER root

# Install built OMS Driver redis module to expected location
RUN mkdir -p /usr/local/lib
COPY ./build/src/lib /usr/local/lib/

# Install required libraries
RUN apk add --no-cache libstdc++ gcompat

Essentially we have a base build container where things were compiled then we copy in a redis module to your image.

Let me know if you need any more information.