OT-CONTAINER-KIT / redis-operator

A golang based redis operator that will make/oversee Redis standalone/cluster/replication/sentinel mode setup on top of the Kubernetes.
https://ot-redis-operator.netlify.app/
Apache License 2.0
731 stars 207 forks source link

feat: add support for configurable probe handlers #934

Closed com6056 closed 1 month ago

com6056 commented 1 month ago

Description

Add the ability to specify your own ProbeHandler, which switches the Probe type to the upstream corev1.Probe.

It looks like this was originally added in https://github.com/OT-CONTAINER-KIT/redis-operator/pull/178, but then for some reason removed in https://github.com/OT-CONTAINER-KIT/redis-operator/pull/302 🤔

@iamabhishek-dubey can you provide any context here? Is this change reasonable?

Type of change

Checklist

Additional Context

For context, we really need to check more of the cluster state before rolling pods, so we want to add our own probe (something like this, although haven't tested it yet):

#!/bin/sh

size=$(redis-cli cluster info | grep cluster_size | cut -d: -f2 | tr -d '[:space:]')

# Cluster is bootstrapping
if [ "$size" -lt $CLUSTER_SIZE ]; then
  echo "WARN: Cluster size is $size"
  exit
fi

state=$(redis-cli cluster info | grep cluster_state | cut -d: -f2 | tr -d '[:space:]')

if [ "$state" != ok ]; then
  echo "FAIL: Cluster state is $state"
  exit 1
fi

slots_assigned=$(redis-cli cluster info | grep cluster_slots_assigned | cut -d: -f2 | tr -d '[:space:]')
slots_ok=$(redis-cli cluster info | grep cluster_slots_ok | cut -d: -f2 | tr -d '[:space:]')

if [ "$slots_assigned" != "$slots_ok" ]; then
  echo "FAIL: Not all assigned cluster slots are ok"
  exit 1
fi

role=$(redis-cli info replication | grep role | cut -d: -f2 | tr -d '[:space:]')

if [ "$role" = master ]; then
  connected_followers=$(redis-cli info replication | grep connected_slaves | cut -d: -f2 | tr -d '[:space:]')

  if [ "$connected_followers" != 1 ]; then
    echo "FAIL: Connected followers is $connected_followers"
    exit 1
  fi
else
  leader_link_status=$(redis-cli info replication | grep master_link_status | cut -d: -f2 | tr -d '[:space:]')

  if [ "$leader_link_status" != up ]; then
    echo "FAIL: Leader link status is $leader_link_status"
    exit 1
  fi
fi

It looks like this was asked for in https://github.com/OT-CONTAINER-KIT/redis-operator/issues/170 as well, and I wonder if we should just build this functionality into the operator itself rather than having everyone have to figure out their own method for ensuring the cluster rolls safely.

com6056 commented 1 month ago

@drivebyer any chance I could get a review of this? 🙏 Thanks! 🎉

drivebyer commented 1 month ago

@com6056 I noticed there are many changes in the API. Do these changes result in breaking changes?

com6056 commented 1 month ago

@com6056 I noticed there are many changes in the API. Do these changes result in breaking changes?

They shouldn't be breaking, the probes had all of the same fields as the corev1.Probe struct, so it should be a no-op for anyone that doesn't actually set a probe handler

It looks like this was also originally added in https://github.com/OT-CONTAINER-KIT/redis-operator/pull/178, but then for some reason removed in https://github.com/OT-CONTAINER-KIT/redis-operator/pull/302

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 7.69231% with 24 lines in your changes are missing coverage. Please review.

Project coverage is 39.12%. Comparing base (d121d86) to head (7e3ecde). Report is 56 commits behind head on master.

Files Patch % Lines
k8sutils/statefulset.go 0.00% 18 Missing :warning:
k8sutils/redis-replication.go 0.00% 2 Missing :warning:
k8sutils/redis-sentinel.go 33.33% 2 Missing :warning:
k8sutils/redis-standalone.go 0.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #934 +/- ## ========================================== + Coverage 35.20% 39.12% +3.92% ========================================== Files 19 20 +1 Lines 3213 2717 -496 ========================================== - Hits 1131 1063 -68 + Misses 2015 1583 -432 - Partials 67 71 +4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

drivebyer commented 1 month ago

Does Kubernetes validate probe fields if set InitialDelaySeconds to -1? @com6056

com6056 commented 1 month ago

Does Kubernetes validate probe fields if set InitialDelaySeconds to -1? @com6056

It doesn't appear that it does in CRDs (I'm guessing the operator would just fail to create the StatefulSet), and kubebuilder unfortunately doesn't seem to have a way to add validation to nested fields for this 😞

com6056 commented 1 month ago

If we want to keep the existing validation we can just add the ProbeHandler to the existing Probe struct, happy to make that change if we want, but I figured switching to just the upstream corev1.Probe would be better long-term rather than maintaining our own identical struct

drivebyer commented 1 month ago

but I figured switching to just the upstream corev1.Probe would be better long-term rather than maintaining our own identical struct

agree.

BTW, I test this yaml:

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
spec:
  containers:
    - name: nginx-container
      image: nginx
      ports:
        - containerPort: 80
      readinessProbe:
        initialDelaySeconds: -1
        periodSeconds: -1
        httpGet:
          path: /index.html
          port: 80
➜  tools git:(main) ✗ kubectl -n mcamel-system --kubeconfig /Users/XXX/Documents/kubeconfig/xx-dev.yaml apply -f yamls/test.yaml
The Pod "nginx-pod" is invalid: 
* spec.containers[0].readinessProbe.initialDelaySeconds: Invalid value: -1: must be greater than or equal to 0
* spec.containers[0].readinessProbe.periodSeconds: Invalid value: -1: must be greater than or equal to 0