bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.98k stars 9.21k forks source link

[bitnami/redis-cluster] Deployment doesn't work (cluster in `fail` state) #10172

Closed ryuzakyl closed 1 year ago

ryuzakyl commented 2 years ago

Name and Version

bitnami/redis-cluster 7.5.2, 7.5.0

What steps will reproduce the bug?

Simply run the command on your TL;DR section of the Chart: $ helm install my-release bitnami/redis-cluster

Are you using any custom parameters or values?

Yes and No (see reproduction steps above).

I've also tried with some parameters for tweaking the readiness probe configuration:

$ helm install redis-cluster bitnami/redis-cluster \
    --version="7.5.0" \
    --set persistence.size="2Gi" \
    --set persistence.volumePermissions=true \
    --set global.storageClass="nfs-client" \
    --set redis.readinessProbe.initialDelaySeconds=10 \
    --set redis.readinessProbe.periodSeconds=5 \
    --set redis.readinessProbe.timeoutSeconds=3 \
    --set redis.readinessProbe.successThreshold=1 \
    --set redis.readinessProbe.failureThreshold=10

Also tried with the following values.yaml (both from helm install and Terraform):

redis:
  nodeSelector:
    eks.amazonaws.com/capacityType: ON_DEMAND

metrics:
  enabled: true

  ## Metrics exporter pod Annotation and Labels
  podAnnotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9121"

service:
  annotations:
    service.annotations.prometheus.io/port: "9121"

persistence:
  size: 2Gi

What is the expected behavior?

The deployment working flawlessly.

What do you see instead?

When the deployment finishes, all the pods indicate the Running state, but looking at the READY column, they are all 0/1.

NAME                             READY   STATUS    RESTARTS   AGE
pod/my-release-redis-cluster-0   0/1     Running   0          12m
pod/my-release-redis-cluster-1   0/1     Running   1          12m
pod/my-release-redis-cluster-2   0/1     Running   0          12m
pod/my-release-redis-cluster-3   0/1     Running   0          12m
pod/my-release-redis-cluster-4   0/1     Running   0          12m
pod/my-release-redis-cluster-5   0/1     Running   0          12m

Next, we try to get the reason for that and see it's regarding to a readiness probe fail:

Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Normal   Scheduled               15m                    default-scheduler        Successfully assigned default/my-release-redis-cluster-0 to REDACTED
  Normal   SuccessfulAttachVolume  15m                    attachdetach-controller  AttachVolume.Attach succeeded for volume "<REDACTED>"
  Normal   Pulled                  15m                    kubelet                  Container image "docker.io/bitnami/redis-cluster:6.2.7-debian-10-r0" already present on machine
  Normal   Created                 15m                    kubelet                  Created container my-release-redis-cluster
  Normal   Started                 15m                    kubelet                  Started container my-release-redis-cluster
  Warning  Unhealthy               4m59s (x120 over 14m)  kubelet                  Readiness probe failed: cluster_state:fail

Next, we try to determine (at first glance) which could be the cause for this fail state using redis-cli: For a Master node:

I have no name!@redis-client:/$ REDISCLI_AUTH="PASSWORD" redis-cli -h POD-IP -p 6379
POD-IP:6379> ping
PONG
POD-IP:6379> cluster info
cluster_state:fail
cluster_slots_assigned:5461
cluster_slots_ok:5461
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:2
cluster_size:1
cluster_current_epoch:5
cluster_my_epoch:1
cluster_stats_messages_ping_sent:1568
cluster_stats_messages_pong_sent:1570
cluster_stats_messages_sent:3138
cluster_stats_messages_ping_received:1570
cluster_stats_messages_pong_received:1568
cluster_stats_messages_received:3138
POD-IP:6379> get name
(error) CLUSTERDOWN Hash slot not served

For a Slave node:

I have no name!@redis-client:/$ REDISCLI_AUTH="PASSWORD" redis-cli -h POD-IP -p 6379
POD-IP:6379> ping
PONG
POD-IP:6379> cluster info
cluster_state:fail
cluster_slots_assigned:5462
cluster_slots_ok:5462
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:1
cluster_current_epoch:2
cluster_my_epoch:2
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
POD-IP:6379> get name
(error) CLUSTERDOWN The cluster is down

Next, we try to see if there's anything else strange on the pod logs (but I don't see anything weird): For a Master node:

redis-cluster 09:59:53.70                                                                                                                                               
redis-cluster 09:59:53.71 Welcome to the Bitnami redis-cluster container                                                                                                
redis-cluster 09:59:53.71 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-redis-cluster                                              
redis-cluster 09:59:53.71 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-redis-cluster/issues                                          
redis-cluster 09:59:53.71                                                                                                                                               
redis-cluster 09:59:53.72 INFO  ==> ** Starting Redis setup **                                                                                                          
redis-cluster 09:59:53.79 INFO  ==> Initializing Redis                                                                                                                  
redis-cluster 09:59:53.81 INFO  ==> Setting Redis config file                                                                                                           
Changing old IP OLD_IP1 by the new one NEW_IP1
Changing old IP OLD_IP2 by the new one NEW_IP2
Changing old IP OLD_IP3 by the new one NEW_IP3
Changing old IP OLD_IP4 by the new one NEW_IP4
Changing old IP OLD_IP5 by the new one NEW_IP5
Changing old IP OLD_IP6 by the new one NEW_IP6                                                                                                               
redis-cluster 09:59:59.00 INFO  ==> ** Redis setup finished! **                                                                                                         

1:C 12 May 2022 09:59:59.094 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo                                                                                            
1:C 12 May 2022 09:59:59.094 # Redis version=6.2.7, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 12 May 2022 09:59:59.094 # Configuration loaded
1:M 12 May 2022 09:59:59.095 * monotonic clock: POSIX clock_gettime
1:M 12 May 2022 09:59:59.096 * Node configuration loaded, I'm 20872eab65b06e96588969197aadf49d37f20fd1
1:M 12 May 2022 09:59:59.096 # A key '__redis__compare_helper' was added to Lua globals which is not on the globals allow list nor listed on the deny list.
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 6.2.7 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                  
 (    '      ,       .-`  | `,    )     Running in cluster mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           https://redis.io       
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

1:M 12 May 2022 09:59:59.097 # Server initialized
1:M 12 May 2022 09:59:59.097 * Ready to accept connections

For a Slave node:

edis-cluster 09:59:58.77                                                                                                                                               
redis-cluster 09:59:58.77 Welcome to the Bitnami redis-cluster container                                                                                                
redis-cluster 09:59:58.77 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-redis-cluster                                              
redis-cluster 09:59:58.77 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-redis-cluster/issues                                          
redis-cluster 09:59:58.77                                                                                                                                               
redis-cluster 09:59:58.77 INFO  ==> ** Starting Redis setup **                                                                                                          
redis-cluster 09:59:58.80 INFO  ==> Initializing Redis                                                                                                                  
redis-cluster 09:59:58.81 INFO  ==> Setting Redis config file                                                                                                           
Changing old IP OLD_IP1 by the new one NEW_IP1
Changing old IP OLD_IP2 by the new one NEW_IP2
Changing old IP OLD_IP3 by the new one NEW_IP3
Changing old IP OLD_IP4 by the new one NEW_IP4
Changing old IP OLD_IP5 by the new one NEW_IP5
Changing old IP OLD_IP6 by the new one NEW_IP6                                                                                                                
redis-cluster 09:59:58.88 INFO  ==> ** Redis setup finished! **                                                                                                         

1:C 12 May 2022 09:59:58.916 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo                                                                                            
1:C 12 May 2022 09:59:58.916 # Redis version=6.2.7, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 12 May 2022 09:59:58.916 # Configuration loaded
1:M 12 May 2022 09:59:58.917 * monotonic clock: POSIX clock_gettime
1:M 12 May 2022 09:59:58.917 * Node configuration loaded, I'm 45afe603fc411d80c4d495d6e8ecbb89b76bfb2d
1:M 12 May 2022 09:59:58.918 # A key '__redis__compare_helper' was added to Lua globals which is not on the globals allow list nor listed on the deny list.
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 6.2.7 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                  
 (    '      ,       .-`  | `,    )     Running in cluster mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           https://redis.io       
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

1:M 12 May 2022 09:59:58.918 # Server initialized
1:M 12 May 2022 09:59:58.918 * Ready to accept connections

Additional information

carrodher commented 2 years ago

The liveness (and readiness) probes can be customizable. There are some values set by default which are enough to work in the different environment used in our tests, but if that's the issue, you can try to fine-tune those parameters in order to met your environment needs.

See for example https://github.com/bitnami/charts/blob/master/bitnami/redis-cluster/values.yaml#L499

  ## Configure extra options for Redis&trade; liveness probes
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes)
  ## @param redis.livenessProbe.enabled Enable livenessProbe
  ## @param redis.livenessProbe.initialDelaySeconds Initial delay seconds for livenessProbe
  ## @param redis.livenessProbe.periodSeconds Period seconds for livenessProbe
  ## @param redis.livenessProbe.timeoutSeconds Timeout seconds for livenessProbe
  ## @param redis.livenessProbe.failureThreshold Failure threshold for livenessProbe
  ## @param redis.livenessProbe.successThreshold Success threshold for livenessProbe
  ##
  livenessProbe:
    enabled: true
    initialDelaySeconds: 5
    periodSeconds: 5
    timeoutSeconds: 5
    successThreshold: 1
ryuzakyl commented 2 years ago

Hey @carrodher. Thanks for the response!

Regarding the fine tuning of the liveness/readiness probes, I did that with no luck. As I mentioned in the walkthrough/troubleshooting details, I think there's another underlying issue (not the probes) rendering the redis-cluster in a fail state from the very beginning.

Any other pointer would be appreciated.

carrodher commented 2 years ago

Unfortunately, I am trying to reproduce the issue but without luck, the different deployments I am doing succeeded as expected. I also checked our automated test & release pipeline where all the Helm charts are tested on top of different k8s clusters (TKG, AKS, GKE, IKS) and there are no issues. Are you able to reproduce the issue without using a custom values.yaml? Just with default parameters.

ryuzakyl commented 2 years ago

Yes, as mentioned in the bug report, by simply running helm install my-release bitnami/redis-cluster I get the errors reported.

Could it be something related to permissions required by the chart (Security Groups in the case of AWS)? I do know that for the redis chart, we have to allow traffic on port 26379 for Sentinel.

Perhaps some extra permissions are required for the redis-cluster chart to deploy correctly?

carrodher commented 2 years ago

Could it be something related to permissions required by the chart (Security Groups in the case of AWS)? I do know that for the redis chart, we have to allow traffic on port 26379 for Sentinel.

It can be an option yes, although it's not something we are hitting on our tests, maybe it depends on how the cluster/account is configured in your use case.

ryuzakyl commented 2 years ago

Thanks again @carrodher, but I find myself in a dead end situation here.

Any other pointer or advice on how to troubleshoot this cluster fail status, would be appreciated.

carrodher commented 2 years ago

Let's see if someone else reports a similar issue or provide any hint. From my tests, I am not able to reproduce the issue in different environments, in the same way the different automation we have in place are also working fine using different k8s/Helm versions as well as different clusters

github-actions[bot] commented 2 years ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 2 years ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

dliffredo commented 2 years ago

Hi, I've the same problem with EKS 1.22 @ryuzakyl have you solve this problem?

ryuzakyl commented 2 years ago

Hi @dliffredo. Sadly, I could not find a solution for that.

It was taking too much time to troubleshoot this issue and decided to switch to Redis with Sentinel enabled.

If I can help you in a any way, just let me know.

dliffredo commented 2 years ago

Thanks in advance, the strange thing is that with the EKS 1.20 version everything always worked correctly without changing the values.yaml, but we have not changed anything except the cluster version

I continue to investigate in the hope of finding a solution

ryuzakyl commented 2 years ago

Interesting observation.

I'm on EKS v1.21 and I had issues from the start (even tested with several older versions of the chart). This might be able to narrow it down to the v1.20-v1.21 version change.

mrvisser commented 2 years ago

I am also experiencing this issue -- EKS 1.22.

I have an SG on all my EKS cluster worker nodes that allow traffic on all ports from all worker nodes.

Confirmed that if I take the exact same terraform project and set EKS cluster_version to 1.20 the cluster starts up successfully.

baptiste-gaillet commented 2 years ago

Hello.

Same probleme on my cluster in the redis-cluster namespace, with the helm redis-cluster.

I have tryed to launch with my custom values and the original values, the cluster is never launched, because the pods have the event : 'Readiness probe failed: cluster_state:fail'

Thx.

adecchi-2inno commented 2 years ago

Hi, I got the same probleme on GKE 1.24.3 running in namespace redis, with the helm redis-cluster.

baptiste-gaillet commented 2 years ago

Hi @adecchi-2inno ,

I have open a new issue here : https://github.com/bitnami/charts/issues/12901

Thx a lot.

jeicopercy commented 1 year ago

I had the same problem but my cluster is running K8S 1.20, for fixing the problem I use the same version that is work fine in other cluster with k8s 1.20, I used helm chat redis-cluster 8.2.7 and it's works ok!

In my case I had first other problem with Redis but with its volumen, then, trying fix it I reinstalled Redis but never work fine with version latest, was necessary return to helm chart v8.2.7.

I hope that other user can test this same solution and confirm us if work fine too.

aagutu commented 1 year ago

i have the same problem ERROR: Liveness probe failed: Readiness probe failed:

has anyone managed to resole the issue

bubelef commented 1 year ago

Also having this issue...

doroncarmeli commented 1 year ago

I have Same problem on EKS 1.24 and redis-cluster-8.3.11 (Redis 7.0.9) cluster_state:fail Anyone has any insight ?

doroncarmeli commented 1 year ago

I see the below message on pod 0 : M: 9e243597a2746e1820dcec977973e6b4726b4151 63.33.105.99:6379 slots:[0-5460] (5461 slots) master M: 9047f8b1e93423ab362cb6db623b8586fecce533 52.49.51.101:6379 slots:[5461-10922] (5462 slots) master M: 4bf7af4d257a98110f2148e4654c3b6d610ebddb 54.73.69.233:6379 slots:[10923-16383] (5461 slots) master S: 294b6e348e3ad353eb3d3874bf474bec3cd4a2a4 54.170.13.129:6379 replicates 4bf7af4d257a98110f2148e4654c3b6d610ebddb S: 604f1762e9ce39689ced6c7306593cd1d4737bc9 52.48.237.198:6379 replicates 9e243597a2746e1820dcec977973e6b4726b4151 S: 27353de1f5709341e6315a7ed3b1f3dbd30262f0 54.170.34.37:6379 replicates 9047f8b1e93423ab362cb6db623b8586fecce533

Nodes configuration updated Assign a different config epoch to each node Sending CLUSTER MEET messages to join the cluster Waiting for the cluster to join

doroncarmeli commented 1 year ago

I know the root cause! At least in my case. My EKS cluster is configured to service external addresses as well as internal addresses External Meaning outside of AWS as internet facing and Internal as in internal to the VPC (not internal to the EKS) So whenever you set as service type as LoadBalancer it defaults to "External Services" . In the generic Bitnami/Redis-Cluster default its 6 entities that are set as "LoadBalancer" not including the service itself - so altogether 7. Each of the pods is assigned an FQDN that points to an external internet-facing IP address. There are two ways to solve this: Set the EKS cluster to service internal IP addresses only (That is VPC internal not k8s internal) . Enable an annotation in the values file similar to the Rebbotmqcluster annotation (set under service type "LoadBalancer" ): service: type: LoadBalancer annotations: service.beta.kubernetes.io/aws-load-balancer-internal: "true" #< This one

pat-s commented 1 year ago

I am facing the same issue on EKS 1.26.2. The confusing part is that I had a working cluster some days before (with the same config and a fresh PVC & PV).

The suggestions by @doroncarmeli did not help in my case.

Given that so many people still seem to face this issue, I would consider re-opening it and taking a closer look.

chart version: 8.4.3

gorakhgaurav commented 1 year ago

I am also facing the same issue with the openshift on-premise cluster.

delamainer commented 1 year ago

I'm facing the same issue with AWS EKS v1.23.17

east-shine commented 1 year ago

I too am having the same problem.

Events: Type Reason Age From Message


Normal Scheduled 16s default-scheduler Successfully assigned infra/redis-cluster-0 to minikube Normal Pulled 15s kubelet Container image "docker.io/bitnami/redis-cluster:7.2.0-debian-11-r0" already present on machine Normal Created 15s kubelet Created container redis-cluster Normal Started 15s kubelet Started container redis-cluster Warning Unhealthy 1s (x2 over 6s) kubelet Liveness probe failed: Could not connect to Redis at localhost:6379: Connection refused Warning Unhealthy 1s (x2 over 6s) kubelet Readiness probe failed: Could not connect to Redis at localhost:6379: Connection refused

east-shine commented 1 year ago

kubectl get pod redis-cluster-0 10.244.0.75 redis-cluster-1 10.244.0.76 redis-cluster-2 10.244.0.77

kubectl exec -it redis-cluster-0 -c redis-cluster -- redis-cli When I checked with the cluster nodes command, it had the wrong address information. 10.244.0.75:6379@16379 myself,master 10.244.0.25:6379@16379 master,fail? 10.244.0.26:6379@16379 master,fail?

So use the CLUSTER FORGET [node-id] command to clear the wrong address I added the correct address using the CLUSTER MEET [correct-ip] [port] command. Clustering was built fine.

For reference, my values.yaml is:

cluster:
  init: true
  nodes: 3
  replicas: 0

usePassword: false
password: ''

service:
  type: LoadBalancer
  port: 6379
  name: redis-cluster

And when I injected istio-proxy to all pods it worked.

CeliaGMqrz commented 1 year ago

Hi,

Thanks for reporting this issue and providing feedback.

I'm sorry, but I wasn't able to reproduce the error. The tests from my env are successful. It seems a network issue and could be related to Istio configuration, but we don’t have enough and clear information to help you. If you could provide more detailed steps to reproduce the problem, it would be greatly helpful in finding a solution.

Anyway, It appears that you have found a possible fix. We will keep the issue open for community testing and feedback.

pat-s commented 1 year ago

Given how many reported this and how often I've encountered the issue myself, I doubt it is a network issue.

I guess nobody really has a detailed idea where it comes from and that would aid debugging. I haven't played around with redis-cluster like I did a few/months ago but I doubt that the issue just went away (not impossible ofc when it was related to an upstream issue ofc).

jeffersonlmartins commented 1 year ago

I think that the problem is the healthCheck of liveness and readnessProbe. For example, the default health is two scripts on /scripts path of the image redis.

I edited manually the statefulset and change the default command executed to:

livenessProbe:
          exec:
            command:
            - sh
            - -c
            - redis-cli -h localhost -p 6379 ping

and voilá! It works!

I tried to make this using helm file, but it didn't work, didn't change the default value on statefulset. I tried this configuration on values.yaml:

livenessProbe:
    enabled: false

customLivenessProbe:
  enabled: true
  initialDelaySeconds: 20
  periodSeconds: 10
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 5
  exec:
    command:
      - sh
      - -c
      - redis-cli -h localhost -p 6379 ping

Someone knows how to configure a custom using values file?

suryastef commented 1 year ago

Nice research right there @jeffersonlmartins. Anyway, I tried doing what you did, and it works for me. I guess there is some miss indentation in your values.yaml file, and the customLivenessProbe should be under redis, no need to disable default LivenessProbe and ReadinessProbe, just add custom exec command. Here is my values.yaml (for reference):

redis:
  customLivenessProbe:
    exec:
      command:
        - sh
        - -c
        - redis-cli -h localhost -p $REDIS_PORT_NUMBER ping
  customReadinessProbe:
    exec:
      command:
        - sh
        - -c
        - redis-cli -h localhost -p $REDIS_PORT_NUMBER ping

some tip: using $REDIS_PORT_NUMBER should be less human error :wink:

ITLight commented 1 year ago

In my case, I just increased the response timeout from 15(default) to 60 in scripts-configmap.yaml, then everything working well.

andrewseif commented 1 year ago

I have been running into this problem with no solution in sight.

I think there's a missing piece which is the --cluster create not actually creating the cluster.

CeliaGMqrz commented 1 year ago

Hi @suryastef, @jeffersonlmartins,

Thanks for your feedback and detailed information. Unfortunately, we have not found a concrete solution because the environments are very different for each case, (See some comments on #12901) but this issue is currently on our radar. Anyway, would you like to contribute by creating a PR to solve the issue? Let me know if you need any assistance with the process. The Bitnami team will be happy to review it and provide feedback. Here you can find the contributing guidelines.

scrqkgv4567 commented 1 year ago

I have been troubled by this issue for the past week. It wasn't until today that I found out I have an iptables rule like this: iptables -t nat -p tcp --dport 16379 -j DNAT --to xxxx:xxxx. I think you should first investigate whether the bus port (default:16379) is being interfered with.

rafzei commented 1 year ago

Had the same issue but for a standalone setup. TL;DR: increase timeoutSeconds in readinessProbe

I've checked and turns out that Redis is working as expected. Let's check readiness times. The default:

    readinessProbe:
      enabled: true
      initialDelaySeconds: 20
      periodSeconds: 5
      timeoutSeconds: 1
      successThreshold: 1
      failureThreshold: 5

the timeoutSeconds field is used within a Pod's readiness probe to specify the number of seconds that the system should wait for a response from the container before considering the probe to have failed.

I understand that in some setups it could take ms to get the response but not in my case, so increasing timeoutSeconds to 10 was the solution.

github-actions[bot] commented 1 year ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 1 year ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

Duck5el commented 5 months ago

So just a quick take on this issue. I encountered a similar problem myself. Fortunately, it was a fresh setup, so I simply deleted the PVCs and the entire STS, then redeployed everything. After that, everything was working fine.

In my case, I think what caused the issue was that I misunderstood the two fields in the Helm chart:

cluster.nodes: The number of master nodes should always be >= 3; otherwise, cluster creation will fail. cluster.replicas: Number of replicas for every master in the cluster (default is 1).

Initially, I set cluster.nodes to 3, but after realizing it had to be 6, I updated my config. However, the Redis setup didn't want to start anymore. So deleting everything, was the fix for me.

cradules commented 4 months ago

Hi. The problem is next:

The chart comes with a minimum resource set, "nano". This is going to increase the response time of the application. The liveness and readiness are not adjusted for this kind of low resources and for here the response time is way over what is set in liveness and readiness.

Solution:

Set minimum "medium" as resourcesPrese (this is for testing only). I believe you need the "large" spec for a dev env (or even more, if you have high activity) and also adjust the timeout of the liveness and readiness

You must delete the cluster, including the PVC(s) if you have already deployed it.

After that you redeploy it with the new values:

  livenessProbe:
    enabled: true
    initialDelaySeconds: 5
    periodSeconds: 5
    timeoutSeconds: 5
    successThreshold: 1
    failureThreshold: 5

  readinessProbe:
    enabled: true
    initialDelaySeconds: 5
    periodSeconds: 5
    timeoutSeconds: 5
    successThreshold: 1
    failureThreshold: 5

resourcesPreset: "medium"

For production, use custom resource templates and apply the resources you know your production needs.

johnwc commented 2 weeks ago

So just a quick take on this issue. I encountered a similar problem myself. Fortunately, it was a fresh setup, so I simply deleted the PVCs and the entire STS, then redeployed everything. After that, everything was working fine.

In my case, I think what caused the issue was that I misunderstood the two fields in the Helm chart:

cluster.nodes: The number of master nodes should always be >= 3; otherwise, cluster creation will fail. cluster.replicas: Number of replicas for every master in the cluster (default is 1).

Initially, I set cluster.nodes to 3, but after realizing it had to be 6, I updated my config. However, the Redis setup didn't want to start anymore. So deleting everything, was the fix for me.

Came here from searching for same error as OP stated, cluster state was in failed state. Having been confused as well with the config, we also set it to 3 on initial deployment. After following this, removing the deployment as well as PVC, it then deployed without failed state.