Open davidpechcz opened 7 months ago
I am not quite sure, which resources are relevant the bug, as we run 1 Redis, 1 RedisCluster and 1 RedisReplication in the same namespace for the project.
apiVersion: redis.redis.opstreelabs.in/v1beta2
kind: Redis
metadata:
annotations:
creationTimestamp: "2023-11-29T08:03:12Z"
finalizers:
- redisFinalizer
generation: 7
labels:
app.kubernetes.io/instance: makro-master
name: redis-prometheus
namespace: makro-master
resourceVersion: "100211488"
uid: 633b26e0-ce5f-4fd2-988e-99a40093d6a2
spec:
kubernetesConfig:
image: quay.io/opstree/redis:v7.0.11
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 100m
memory: 128Mi
updateStrategy: {}
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
redisConfig:
additionalRedisConfig: redis-prometheus-configmap
redisExporter:
enabled: true
image: oliver006/redis_exporter:v1.50.0-alpine
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 200m
memory: 64Mi
requests:
cpu: 200m
memory: 64Mi
apiVersion: redis.redis.opstreelabs.in/v1beta2
kind: RedisCluster
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"redis.redis.opstreelabs.in/v1beta2","kind":"RedisCluster","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"makro-master"},"name":"redis-cache","namespace":"makro-master"},"spec":{"clusterSize":3,"kubernetesConfig":{"image":"quay.io/opstree/redis:v7.0.12","imagePullPolicy":"IfNotPresent","resources":{"requests":{"cpu":"100m","memory":"128Mi"}}},"persistenceEnabled":false,"podSecurityContext":{"fsGroup":1000,"runAsUser":1000},"redisExporter":{"enabled":true,"image":"oliver006/redis_exporter:v1.50.0-alpine","imagePullPolicy":"IfNotPresent","resources":{"limits":{"cpu":"200m","memory":"64Mi"},"requests":{"cpu":"200m","memory":"64Mi"}}}}}
creationTimestamp: "2023-11-29T12:40:01Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2023-11-29T12:43:18Z"
finalizers:
- redisClusterFinalizer
generation: 4
labels:
app.kubernetes.io/instance: makro-master
name: redis-cache
namespace: makro-master
resourceVersion: "100209752"
uid: c0966c4a-ba0c-4bdf-8d9b-31dd85449778
spec:
clusterSize: 3
clusterVersion: v7
kubernetesConfig:
image: quay.io/opstree/redis:v7.0.12
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 100m
memory: 128Mi
updateStrategy: {}
persistenceEnabled: false
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
redisExporter:
enabled: true
image: oliver006/redis_exporter:v1.50.0-alpine
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 200m
memory: 64Mi
requests:
cpu: 200m
memory: 64Mi
redisFollower:
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
redisLeader:
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
status:
readyFollowerReplicas: 0
readyLeaderReplicas: 0
reason: RedisCluster is initializing leaders
state: Initializing
The operator went online when I removed - redisClusterFinalizer from RedisCluster name: redis-cache
When we tear everything in the namespace down, Operator went online. When we inserted only the redis-cache Cluster (see above), another Operator crashback loop:
{"level":"info","ts":1701264296.5407295,"logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1701264296.5417504,"logger":"setup","msg":"starting manager"}
{"level":"info","ts":1701264296.542449,"msg":"Starting server","kind":"health probe","addr":"[::]:8081"}
I1129 13:24:56.542670 1 leaderelection.go:248] attempting to acquire leader lease redis-operator/6cab913b.redis.opstreelabs.in...
{"level":"info","ts":1701264296.5430605,"msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"}
I1129 13:25:13.885013 1 leaderelection.go:258] successfully acquired lease redis-operator/6cab913b.redis.opstreelabs.in
{"level":"info","ts":1701264313.885664,"logger":"controller.redisreplication","msg":"Starting EventSource","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisReplication","source":"kind source: *v1beta2.RedisReplication"}
{"level":"info","ts":1701264313.8857813,"logger":"controller.redisreplication","msg":"Starting Controller","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisReplication"}
{"level":"info","ts":1701264313.8858335,"logger":"controller.redis","msg":"Starting EventSource","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"Redis","source":"kind source: *v1beta2.Redis"}
{"level":"info","ts":1701264313.885918,"logger":"controller.redis","msg":"Starting Controller","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"Redis"}
{"level":"info","ts":1701264313.8863041,"logger":"controller.rediscluster","msg":"Starting EventSource","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisCluster","source":"kind source: *v1beta2.RedisCluster"}
{"level":"info","ts":1701264313.886352,"logger":"controller.rediscluster","msg":"Starting Controller","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisCluster"}
{"level":"info","ts":1701264313.8865294,"logger":"controller.redissentinel","msg":"Starting EventSource","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisSentinel","source":"kind source: *v1beta2.RedisSentinel"}
{"level":"info","ts":1701264313.886581,"logger":"controller.redissentinel","msg":"Starting Controller","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisSentinel"}
{"level":"info","ts":1701264313.9870625,"logger":"controller.redisreplication","msg":"Starting workers","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisReplication","worker count":1}
{"level":"info","ts":1701264313.987052,"logger":"controller.rediscluster","msg":"Starting workers","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisCluster","worker count":1}
{"level":"info","ts":1701264313.9873176,"logger":"controllers.RedisReplication","msg":"Reconciling opstree redis replication controller","Request.Namespace":"makro-master","Request.Name":"redis-session"}
{"level":"info","ts":1701264313.987473,"logger":"controller.redis","msg":"Starting workers","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"Redis","worker count":1}
{"level":"info","ts":1701264313.987617,"logger":"controllers.Redis","msg":"Reconciling opstree redis controller","Request.Namespace":"makro-acc","Request.Name":"redis-cache"}
{"level":"info","ts":1701264313.988288,"logger":"controller.redissentinel","msg":"Starting workers","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisSentinel","worker count":1}
{"level":"info","ts":1701264314.0525432,"logger":"KubeAPIWarningLogger","msg":"unknown field \"spec.storage.volumeClaimTemplate.metadata.creationTimestamp\""}
{"level":"error","ts":1701264314.069905,"logger":"controller.redisreplication","msg":"Reconciler error","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisReplication","name":"redis-session","namespace":"makro-master","error":"Operation cannot be fulfilled on redisreplications.redis.redis.opstreelabs.in \"redis-session\": StorageError: invalid object, Code: 4, Key: /registry/redis.redis.opstreelabs.in/redisreplications/makro-master/redis-session, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 5cde5644-4f16-4266-b559-a712bc595d13, UID in object meta: ","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":1701264314.0700433,"logger":"controllers.RedisReplication","msg":"Reconciling opstree redis replication controller","Request.Namespace":"makro-master","Request.Name":"redis-session"}
{"level":"info","ts":1701264314.0756187,"logger":"controllers.RedisReplication","msg":"Reconciling opstree redis replication controller","Request.Namespace":"makro-master","Request.Name":"redis-session"}
{"level":"info","ts":1701264314.1333895,"logger":"controllers.Redis","msg":"Will reconcile redis operator in again 10 seconds","Request.Namespace":"makro-acc","Request.Name":"redis-cache"}
{"level":"info","ts":1701264314.1335533,"logger":"controllers.Redis","msg":"Reconciling opstree redis controller","Request.Namespace":"makro-acc","Request.Name":"redis-prometheus"}
{"level":"info","ts":1701264314.2288787,"logger":"controllers.Redis","msg":"Will reconcile redis operator in again 10 seconds","Request.Namespace":"makro-acc","Request.Name":"redis-prometheus"}
{"level":"info","ts":1701264314.2289617,"logger":"controllers.Redis","msg":"Reconciling opstree redis controller","Request.Namespace":"makro-acc","Request.Name":"redis-session"}
{"level":"info","ts":1701264314.314199,"logger":"controllers.Redis","msg":"Will reconcile redis operator in again 10 seconds","Request.Namespace":"makro-acc","Request.Name":"redis-session"}
{"level":"info","ts":1701264324.1342144,"logger":"controllers.Redis","msg":"Reconciling opstree redis controller","Request.Namespace":"makro-acc","Request.Name":"redis-cache"}
{"level":"info","ts":1701264324.2097836,"logger":"controllers.Redis","msg":"Will reconcile redis operator in again 10 seconds","Request.Namespace":"makro-acc","Request.Name":"redis-cache"}
{"level":"info","ts":1701264324.2294283,"logger":"controllers.Redis","msg":"Reconciling opstree redis controller","Request.Namespace":"makro-acc","Request.Name":"redis-prometheus"}
{"level":"info","ts":1701264324.2831347,"logger":"controllers.Redis","msg":"Will reconcile redis operator in again 10 seconds","Request.Namespace":"makro-acc","Request.Name":"redis-prometheus"}
{"level":"info","ts":1701264324.314352,"logger":"controllers.Redis","msg":"Reconciling opstree redis controller","Request.Namespace":"makro-acc","Request.Name":"redis-session"}
{"level":"info","ts":1701264324.3573327,"logger":"controllers.Redis","msg":"Will reconcile redis operator in again 10 seconds","Request.Namespace":"makro-acc","Request.Name":"redis-session"}
{"level":"info","ts":1701264334.2104197,"logger":"controllers.Redis","msg":"Reconciling opstree redis controller","Request.Namespace":"makro-acc","Request.Name":"redis-cache"}
{"level":"info","ts":1701264334.2564874,"logger":"controllers.Redis","msg":"Will reconcile redis operator in again 10 seconds","Request.Namespace":"makro-acc","Request.Name":"redis-cache"}
{"level":"info","ts":1701264334.28431,"logger":"controllers.Redis","msg":"Reconciling opstree redis controller","Request.Namespace":"makro-acc","Request.Name":"redis-prometheus"}
{"level":"info","ts":1701264334.3261616,"logger":"controllers.Redis","msg":"Will reconcile redis operator in again 10 seconds","Request.Namespace":"makro-acc","Request.Name":"redis-prometheus"}
{"level":"info","ts":1701264334.3586192,"logger":"controllers.Redis","msg":"Reconciling opstree redis controller","Request.Namespace":"makro-acc","Request.Name":"redis-session"}
{"level":"info","ts":1701264334.4050875,"logger":"controllers.Redis","msg":"Will reconcile redis operator in again 10 seconds","Request.Namespace":"makro-acc","Request.Name":"redis-session"}
{"level":"info","ts":1701264339.4776044,"logger":"controllers.RedisCluster","msg":"Reconciling opstree redis Cluster controller","Request.Namespace":"makro-master","Request.Name":"redis-cache"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x142fc49]
goroutine 235 [running]:
github.com/OT-CONTAINER-KIT/redis-operator/k8sutils.generateRedisClusterParams(_, _, _, {{0x177589a, 0x6}, 0x0, 0x0, 0x0, 0x0, 0xc000b68390, ...})
/workspace/k8sutils/redis-cluster.go:33 +0xc9
github.com/OT-CONTAINER-KIT/redis-operator/k8sutils.RedisClusterSTS.CreateRedisClusterSetup({{0x177589a, 0x6}, 0x0, 0x0, 0x0, 0x0, 0xc000b68390, 0xc000b68378, 0x0, 0x0}, ...)
/workspace/k8sutils/redis-cluster.go:218 +0x2fb
github.com/OT-CONTAINER-KIT/redis-operator/k8sutils.CreateRedisLeader(0xc0003a8b00)
/workspace/k8sutils/redis-cluster.go:167 +0xf8
github.com/OT-CONTAINER-KIT/redis-operator/controllers.(*RedisClusterReconciler).Reconcile(0xc0001d0f00, {0xc000ba4840, 0x15a4a00}, {{{0xc000bac1a0, 0x16b3fa0}, {0xc000bac190, 0x30}}})
/workspace/controllers/rediscluster_controller.go:105 +0x3bb
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc0000ee580, {0x1998df8, 0xc000ba4840}, {{{0xc000bac1a0, 0x16b3fa0}, {0xc000bac190, 0x413a34}}})
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114 +0x26f
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0000ee580, {0x1998d50, 0xc0000c9040}, {0x15fc820, 0xc00046e2a0})
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311 +0x33e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0000ee580, {0x1998d50, 0xc0000c9040})
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:223 +0x357
After some go debugging the, problem is that the RedisCluster.spec has some field not properly initialized to default, but to nil values. This particular yaml was fixed providing the default in the yaml with this:
apiVersion: redis.redis.opstreelabs.in/v1beta2
kind: RedisCluster
metadata:
name: redis-cache
spec:
clusterSize: 3
redisLeader: # THIS MUST BE included, go panic instead
replicas: 3
redisFollower: # THIS MUST BE included, go panic instead
replicas: 3
persistenceEnabled: false
storage: # THIS MUST BE included, go panic instead
#volumeClaimTemplate:
# spec:
# storageClassName: {{ $.Values.redis.storageClassName | quote }}
nodeConfVolume: false
podSecurityContext:
runAsUser: 1000
fsGroup: 1000
kubernetesConfig:
image: quay.io/opstree/redis:v7.0.12
imagePullPolicy: IfNotPresent
redisExporter:
enabled: true
image: oliver006/redis_exporter:v1.50.0-alpine
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 200m
memory: 64Mi
limits:
cpu: 200m
memory: 64Mi
Is this somehow related to #712 when disabling the "conversion webhook"? Or is the problem just the default yaml values should be either reported missing or provided with default values?
Have you enabled the conversion webhook then only it would be in place otherwise by default we have kept that off
Hi, helm chart 0.15.9, but with commented out conversion webhook CRDs (from the #712).
No, the webhook was not enabled.
What version of redis operator are you using? Helm 0.15.9
redis-operator version: 0.15.1
Does this issue reproduce with the latest release? Yes
What did you do?
With ArgoCD we have deleted a resources and try to recreate it few seconds later. Problem is related that the original cluster had not enough time to destroy all resources, so some them got stuck and not cleaned up before the new cluster creation was attempted. We have cleaned the resources manually, but got stuck anyway. Operator crashes each time it is restarted. Currently no PVCs are present for this cluster, RBAC or anything is not a problem. No relevant events were generated.
What did you expect to see? We expect, that operator does not crash.