OT-CONTAINER-KIT / redis-operator

A golang based redis operator that will make/oversee Redis standalone/cluster/replication/sentinel mode setup on top of the Kubernetes.
https://ot-redis-operator.netlify.app/
Apache License 2.0
734 stars 207 forks source link

Operator pod crashes after deploying redis-cluster #1014

Closed rackep closed 2 days ago

rackep commented 2 days ago

What version of redis operator are you using?

redis-operator version: 0.17.0 redis-operator helm version: 0.16.4

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (kubectl version)?

kubectl version Output
$ kubectl version
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.10

What did you do?

Installed redis-cluster following steps using helm chart.

helm upgrade redis-operator ot-helm/redis-operator --install --create-namespace --namespace ot-operators

helm upgrade redis-cluster ot-helm/redis-cluster --set redisCluster.clusterSize=3 --install --namespace ot-operators

Helm releases

helm list
NAME            NAMESPACE       REVISION    UPDATED                                 STATUS      CHART                   APP VERSION
redis-cluster   ot-operators    1           2024-06-27 13:36:14.325814 +0200 CEST   deployed    redis-cluster-0.16.0    0.16.0     
redis-operator  ot-operators    1           2024-06-27 13:32:26.307482 +0200 CEST   deployed    redis-operator-0.16.4   0.17.0

What did you expect to see? Redis cluster pods are deployed.

What did you see instead? After installing redis-cluster crd using helm charts, redis operator pod crashes and goes into loopback error.

kubectl get pods -n ot-operators redis-operator-6dd8d4589c-6b4m8 --watch
NAME                              READY   STATUS    RESTARTS   AGE
redis-operator-6dd8d4589c-6b4m8   1/1     Running   0          2m17s
redis-operator-6dd8d4589c-6b4m8   0/1     Error     0          3m49s
redis-operator-6dd8d4589c-6b4m8   1/1     Running   1 (2s ago)   3m50s
redis-operator-6dd8d4589c-6b4m8   0/1     Error     1 (19s ago)   4m7s
redis-operator-6dd8d4589c-6b4m8   0/1     CrashLoopBackOff   1 (13s ago)   4m19s
redis-operator-6dd8d4589c-6b4m8   1/1     Running            2 (14s ago)   4m20s
redis-operator-6dd8d4589c-6b4m8   0/1     Error              2 (33s ago)   4m39s
redis-operator-6dd8d4589c-6b4m8   0/1     CrashLoopBackOff   2 (16s ago)   4m54s
redis-operator-6dd8d4589c-6b4m8   1/1     Running            3 (30s ago)   5m8s
redis-operator-6dd8d4589c-6b4m8   0/1     Error              3 (46s ago)   5m24s
redis-operator-6dd8d4589c-6b4m8   0/1     CrashLoopBackOff   3 (13s ago)   5m36s
redis-operator-6dd8d4589c-6b4m8   1/1     Running            4 (51s ago)   6m14s
redis-operator-6dd8d4589c-6b4m8   0/1     Error              4 (67s ago)   6m30s

Operator pod log

k logs -n ot-operators redis-operator-6dd8d4589c-6b4m8 -f     
{"level":"info","ts":"2024-06-27T11:32:31Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2024-06-27T11:32:31Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-06-27T11:32:31Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8080","secure":false}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"starting server","kind":"health probe","addr":"[::]:8081"}
I0627 11:32:31.390227       1 leaderelection.go:250] attempting to acquire leader lease ot-operators/6cab913b.redis.opstreelabs.in...
I0627 11:32:31.396005       1 leaderelection.go:260] successfully acquired lease ot-operators/6cab913b.redis.opstreelabs.in
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting EventSource","controller":"redis","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"Redis","source":"kind source: *v1beta2.Redis"}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting Controller","controller":"redis","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"Redis"}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting EventSource","controller":"rediscluster","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"RedisCluster","source":"kind source: *v1beta2.RedisCluster"}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting Controller","controller":"rediscluster","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"RedisCluster"}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting EventSource","controller":"redisreplication","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"RedisReplication","source":"kind source: *v1beta2.RedisReplication"}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting Controller","controller":"redisreplication","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"RedisReplication"}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting EventSource","controller":"redissentinel","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"RedisSentinel","source":"kind source: *v1beta2.RedisSentinel"}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting EventSource","controller":"redissentinel","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"RedisSentinel","source":"kind source: *v1beta2.RedisReplication"}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting Controller","controller":"redissentinel","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"RedisSentinel"}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting workers","controller":"rediscluster","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"RedisCluster","worker count":1}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting workers","controller":"redisreplication","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"RedisReplication","worker count":1}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting workers","controller":"redis","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"Redis","worker count":1}
{"level":"info","ts":"2024-06-27T11:32:31Z","msg":"Starting workers","controller":"redissentinel","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"RedisSentinel","worker count":1}
{"level":"info","ts":"2024-06-27T11:36:14Z","logger":"controllers.RedisCluster","msg":"Reconciling opstree redis Cluster controller","Request.Namespace":"ot-operators","Request.Name":"redis-cluster"}
{"level":"info","ts":"2024-06-27T11:36:14Z","logger":"KubeAPIWarningLogger","msg":"unknown field \"spec.storage.nodeConfVolumeClaimTemplate.metadata.creationTimestamp\""}
{"level":"info","ts":"2024-06-27T11:36:14Z","logger":"KubeAPIWarningLogger","msg":"unknown field \"spec.storage.volumeClaimTemplate.metadata.creationTimestamp\""}
{"level":"error","ts":"2024-06-27T11:36:14Z","logger":"controllers.RedisCluster","msg":"Error in getting Redis pod IP","namespace":"ot-operators","podName":"redis-cluster-leader-0","error":"pods \"redis-cluster-leader-0\" not found","stacktrace":"github.com/OT-CONTAINER-KIT/redis-operator/k8sutils.getRedisServerIP\n\t/workspace/k8sutils/redis.go:34\ngithub.com/OT-CONTAINER-KIT/redis-operator/k8sutils.getRedisServerAddress\n\t/workspace/k8sutils/redis.go:57\ngithub.com/OT-CONTAINER-KIT/redis-operator/k8sutils.configureRedisClient\n\t/workspace/k8sutils/redis.go:389\ngithub.com/OT-CONTAINER-KIT/redis-operator/k8sutils.CheckRedisNodeCount\n\t/workspace/k8sutils/redis.go:297\ngithub.com/OT-CONTAINER-KIT/redis-operator/controllers.(*RedisClusterReconciler).Reconcile\n\t/workspace/controllers/rediscluster_controller.go:77\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227"}
{"level":"error","ts":"2024-06-27T11:36:14Z","logger":"controllers.RedisCluster","msg":"Error in getting Redis cluster nodes","error":"dial tcp :6379: connect: connection refused","stacktrace":"github.com/OT-CONTAINER-KIT/redis-operator/k8sutils.checkRedisCluster\n\t/workspace/k8sutils/redis.go:232\ngithub.com/OT-CONTAINER-KIT/redis-operator/k8sutils.CheckRedisNodeCount\n\t/workspace/k8sutils/redis.go:300\ngithub.com/OT-CONTAINER-KIT/redis-operator/controllers.(*RedisClusterReconciler).Reconcile\n\t/workspace/controllers/rediscluster_controller.go:77\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227"}
W0627 11:36:14.549565       1 warnings.go:70] unknown field "spec.storage.nodeConfVolumeClaimTemplate.metadata.creationTimestamp"
W0627 11:36:14.549578       1 warnings.go:70] unknown field "spec.storage.volumeClaimTemplate.metadata.creationTimestamp"
{"level":"info","ts":"2024-06-27T11:36:14Z","msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"rediscluster","controllerGroup":"redis.redis.opstreelabs.in","controllerKind":"RedisCluster","RedisCluster":{"name":"redis-cluster","namespace":"ot-operators"},"namespace":"ot-operators","name":"redis-cluster","reconcileID":"12174c0c-f656-48c2-9cec-58c004b60415"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x12a4df0]

goroutine 148 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:116 +0x1a4
panic({0x14ce6a0?, 0x2788ce0?})
    /usr/local/go/src/runtime/panic.go:914 +0x218
github.com/OT-CONTAINER-KIT/redis-operator/k8sutils.getProbeInfo(0x0, 0x7?, 0x0, 0x0)
    /workspace/k8sutils/statefulset.go:617 +0x3d0
github.com/OT-CONTAINER-KIT/redis-operator/k8sutils.generateContainerDef({_, _}, {{0x40006e6260, 0x1d}, {0x400064c224, 0xc}, 0x0, 0x0, {0x40006b8210, 0x26}, ...}, ...)
    /workspace/k8sutils/statefulset.go:369 +0x114
github.com/OT-CONTAINER-KIT/redis-operator/k8sutils.generateStatefulSetsDef({{0x40007d6af8, 0x14}, {0x0, 0x0}, {0x400064c210, 0xc}, {0x0, 0x0}, {0x0, 0x0}, ...}, ...)
    /workspace/k8sutils/statefulset.go:234 +0x354
github.com/OT-CONTAINER-KIT/redis-operator/k8sutils.CreateOrUpdateStateFul({_, _}, {{_, _}, _}, {_, _}, {{0x40007d6af8, 0x14}, {0x0, ...}, ...}, ...)
    /workspace/k8sutils/statefulset.go:100 +0x140
github.com/OT-CONTAINER-KIT/redis-operator/k8sutils.RedisClusterSTS.CreateRedisClusterSetup({{0x1756058, 0x6}, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, ...)
    /workspace/k8sutils/redis-cluster.go:270 +0x7a4
github.com/OT-CONTAINER-KIT/redis-operator/k8sutils.CreateRedisLeader(0x1756058?, {0x1a11590?, 0x40001ac820?})
    /workspace/k8sutils/redis-cluster.go:222 +0xd0
github.com/OT-CONTAINER-KIT/redis-operator/controllers.(*RedisClusterReconciler).Reconcile(0x40004f0230, {0x19f7c28, 0x400074ab40}, {{{0x400062abe0?, 0x5?}, {0x400062abd0?, 0x400043dcf8?}}})
    /workspace/controllers/rediscluster_controller.go:117 +0x53c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x19faf10?, {0x19f7c28?, 0x400074ab40?}, {{{0x400062abe0?, 0xb?}, {0x400062abd0?, 0x0?}}})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:119 +0x8c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x40004ee780, {0x19f7c60, 0x400047d5e0}, {0x1585600?, 0x4000028fe0?})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:316 +0x2e8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0x40004ee780, {0x19f7c60, 0x400047d5e0})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266 +0x16c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227 +0x74
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 81
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:223 +0x43c
drivebyer commented 2 days ago

Same issue as https://github.com/OT-CONTAINER-KIT/redis-operator/issues/1006. Please track it in that issue.