RedisLabs / redis-enterprise-k8s-docs

151 stars 89 forks source link

REC Cluster doesn't deploy - Azure AKS #275

Open aammundi opened 1 week ago

aammundi commented 1 week ago

Hi, In deploying a REC cluster on Azure AKS, I have followed the steps here: https://redis.io/docs/latest/operate/kubernetes/deployment/quick-start/

Have made several attempts, in almost all of those, the redis pods get crash-loop-backed-off and eventually killed. The experiment I'm doing is fairly repeatable.

  1. create resource group (azure)
  2. create k8's cluster
  3. (try to) deploy REC - REC pods never come up
  4. delete Resource group (which deletes all underlying resources, pvc's etc)
  5. back to step 1

the reason for these iterations is because I had issues with node pools and such and iteratively eliminated those issues. Once I had the right node pools and reqs/limits in place it did come up once. At which point I decided to formalize/clean-up my code and retry from the top.

However, it's back to the crash-loop

Some observations from logs: 2024-07-03 00:40:52,210 - services-rigger.rs - INFO - got an exception while trying to communicate with Redis Enterprise cluster: HTTPSConnectionPool(host='redis', port=9443): Max retries exceeded with url: /v1/nodes (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f1d1988ac70>: Failed to establish a new connection: [Errno 111] Connection refused'))

redis: Is the ClusterIP. I checked via dnsUtils and this is resolvable

kubectl exec -i -t dnsutils -- nslookup redis
Server:     10.0.0.10
Address:    10.0.0.10#53

Name:   redis.ttinfra.svc.cluster.local
Address: 10.0.44.57

Attached is a log generated from log_collector. redis_enterprise_k8s_debug_info_20240702-181115.tar.gz