Closed dm3ch closed 2 years ago
Can you please show me the manifest which you are applying?
Here's a dump of redis-cluster yaml from k8s:
❯ kubectl -n fut-1 get rediscluster redis-cluster-cinema -o yaml
apiVersion: redis.redis.opstreelabs.in/v1beta1
kind: RedisCluster
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"redis.redis.opstreelabs.in/v1beta1","kind":"RedisCluster","metadata":{"annotations":{"meta.helm.sh/release-name":"redis-cluster-cinema","meta.helm.sh/release-namespace":"fut-1"},"creationTimestamp":"2022-01-25T03:35:39Z","generation":1,"labels":{"app.kubernetes.io/component":"middleware","app.kubernetes.io/instance":"redis-cluster-cinema","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"redis-cluster-cinema","app.kubernetes.io/version":"0.8.0","helm.sh/chart":"redis-cluster-0.8.0"},"name":"redis-cluster-cinema","namespace":"fut-1","resourceVersion":"1337191080","uid":"199077d1-e436-409c-81ef-94e3ee38876d"},"spec":{"clusterSize":3,"kubernetesConfig":{"image":"quay.io/opstree/redis:v6.2.5","imagePullPolicy":"IfNotPresent","resources":{"limits":{"cpu":"1000m","memory":"1200Mi"},"requests":{"cpu":"100m","memory":"1024Mi"}},"serviceType":"ClusterIP"},"redisExporter":{"enabled":true,"image":"quay.io/opstree/redis-exporter:1.0","imagePullPolicy":"IfNotPresent","resources":{"limits":{"cpu":"100m","memory":"128Mi"},"requests":{"cpu":"100m","memory":"128Mi"}}},"redisFollower":{"serviceType":"ClusterIP"},"redisLeader":{"serviceType":"ClusterIP"}}}
meta.helm.sh/release-name: redis-cluster-cinema
meta.helm.sh/release-namespace: fut-1
creationTimestamp: "2022-02-08T10:52:15Z"
generation: 1
labels:
app.kubernetes.io/component: middleware
app.kubernetes.io/instance: redis-cluster-cinema
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: redis-cluster-cinema
app.kubernetes.io/version: 0.8.0
helm.sh/chart: redis-cluster-0.8.0
name: redis-cluster-cinema
namespace: fut-1
resourceVersion: "1390121553"
uid: af45139d-7411-47fd-a00f-5b882587ff8e
spec:
clusterSize: 3
kubernetesConfig:
image: quay.io/opstree/redis:v6.2.5
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 1000m
memory: 1200Mi
requests:
cpu: 100m
memory: 1024Mi
serviceType: ClusterIP
redisExporter:
enabled: true
image: quay.io/opstree/redis-exporter:1.0
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
redisFollower:
serviceType: ClusterIP
redisLeader:
serviceType: ClusterIP
@iamabhishek-dubey ping (: Have a same problem.
So the problem is that we need to persist, nodes.conf file which is generated by redis. So if we want to use the redis cluster, in that case, we have to attach a minimal storage PVC to the stateful set as shown in the example. Maybe I will create a story for validation that storageSpec should be defined.
@iamabhishek-dubey We have tested both redis clusters with and without PVC and they are all affected by descirbed problem.
nodes.conf contains pod IP addresses which are changed when pod is recreated, so just persisting nodes.conf wouldn't work if I right understand.
I am not 100% sure, but I believe that the right approach is managing nodes list from opearator - operator should contact each node, check if nodes list contains all existing nodes (and add them if not nodes are known) and delete nodes that doesn't actually exist anymore
P.S. Also it could be workarounded by cerating of non-headless service for each pod, which would allow connecting to the pod via IP that wouldn't change after recreation.
What version of redis operator are you using? 0.9.0
This is not full logs but previous logs for this redis cluster are very simmilar
redis-operator version: 0.9.0 Does this issue reproduce with the latest release?
What operating system and processor architecture are you using (
kubectl version
)?kubectl version
OutputWhat did you do? Created redis cluster with 3 leaders and 3 followers (1 follower for each leader). Wait some time, one of k8s nodes have been shutdown and one of follower pods have been recreated.
What did you expect to see? After deletion of follower pod, it starts without problems. When operator see failed follower - try to fix it or recreate it.
What did you see instead? Follower pod failed, operator just logged failure and haven't tried to fix this issue
Additional troubleshooting details: