OT-CONTAINER-KIT / redis-operator

A golang based redis operator that will make/oversee Redis standalone/cluster/replication/sentinel mode setup on top of the Kubernetes.
https://ot-redis-operator.netlify.app/
Apache License 2.0
738 stars 207 forks source link

Conversion Webhook fails and crashes operator after upgrade from 0.15.0 to 0.15.1 #837

Open lapete opened 3 months ago

lapete commented 3 months ago

What version of redis operator are you using? 0.15.1

Does this issue reproduce with the latest release? Yes

What operating system and processor architecture are you using (kubectl version)?

What did you do?

  1. Upgrade Redis Operator from Helm 0.15.3 (App 0.15.0) to 0.15.9 (App 0.15.1)
  2. Try kubectl get redis.redis.redis.opstreelabs.in -A

What did you expect to see?

What did you see instead?

Additional Notes

Workaround

  conversion:
    strategy: Webhook
    webhook:
      clientConfig:
        service:
          namespace: redis-operator
          name: webhook-service
          path: /convert
          port: 443
      conversionReviewVersions:
        - v1beta1
        - v1beta2

Redis Operator Pod Log

{"level":"info","ts":1711030819.0439823,"msg":"Starting server","kind":"health probe","addr":"[::]:8081"}
I0321 14:20:35.166806       1 leaderelection.go:258] successfully acquired lease redis-operator/6cab913b.redis.opstreelabs.in
{"level":"info","ts":1711030835.167075,"logger":"controller.redis","msg":"Starting EventSource","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"Redis","source":"kind source: *v1beta2.Redis"}
{"level":"info","ts":1711030835.167137,"logger":"controller.redis","msg":"Starting Controller","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"Redis"}
{"level":"info","ts":1711030835.167242,"logger":"controller.rediscluster","msg":"Starting EventSource","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisCluster","source":"kind source: *v1beta2.RedisCluster"}
{"level":"info","ts":1711030835.1672716,"logger":"controller.rediscluster","msg":"Starting Controller","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisCluster"}
{"level":"info","ts":1711030835.1672428,"logger":"controller.redisreplication","msg":"Starting EventSource","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisReplication","source":"kind source: *v1beta2.RedisReplication"}
{"level":"info","ts":1711030835.167257,"logger":"controller.redissentinel","msg":"Starting EventSource","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisSentinel","source":"kind source: *v1beta2.RedisSentinel"}
{"level":"info","ts":1711030835.167332,"logger":"controller.redissentinel","msg":"Starting Controller","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisSentinel"}
{"level":"info","ts":1711030835.1673374,"logger":"controller.redisreplication","msg":"Starting Controller","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisReplication"}
W0321 14:20:35.227395       1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
E0321 14:20:35.227436       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: Failed to watch *v1beta2.Redis: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
W0321 14:20:36.263078       1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
E0321 14:20:36.263115       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: Failed to watch *v1beta2.Redis: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
W0321 14:20:38.631116       1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
E0321 14:20:38.631150       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: Failed to watch *v1beta2.Redis: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
W0321 14:20:43.601322       1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
E0321 14:20:43.601352       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: Failed to watch *v1beta2.Redis: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
W0321 14:20:54.065625       1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
E0321 14:20:54.065658       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: Failed to watch *v1beta2.Redis: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
W0321 14:21:19.382292       1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
E0321 14:21:19.382320       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: Failed to watch *v1beta2.Redis: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
W0321 14:22:03.467115       1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
E0321 14:22:03.467138       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.0/tools/cache/reflector.go:167: Failed to watch *v1beta2.Redis: failed to list *v1beta2.Redis: conversion webhook for redis.redis.opstreelabs.in/v1beta1, Kind=Redis failed: Post "https://webhook-service.redis-operator.svc:443/convert?timeout=30s": http: server gave HTTP response to HTTPS client
{"level":"error","ts":1711030955.1682036,"logger":"controller.redis","msg":"Could not wait for Cache to sync","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"Redis","error":"failed to wait for redis caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:234\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/manager/runnable_group.go:218"}
{"level":"info","ts":1711030955.1683335,"msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":1711030955.1683419,"msg":"Stopping and waiting for leader election runnables"}
{"level":"error","ts":1711030955.1681628,"logger":"controller.redisreplication","msg":"Could not wait for Cache to sync","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisReplication","error":"failed to wait for redisreplication caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:234\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/manager/runnable_group.go:218"}
{"level":"error","ts":1711030955.1683562,"msg":"error received after stop sequence was engaged","error":"failed to wait for redisreplication caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/manager/internal.go:541"}
{"level":"error","ts":1711030955.1682277,"logger":"controller.rediscluster","msg":"Could not wait for Cache to sync","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisCluster","error":"failed to wait for rediscluster caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:234\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/manager/runnable_group.go:218"}
{"level":"error","ts":1711030955.1684012,"msg":"error received after stop sequence was engaged","error":"failed to wait for rediscluster caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/manager/internal.go:541"}
{"level":"error","ts":1711030955.168312,"logger":"controller.redissentinel","msg":"Could not wait for Cache to sync","reconciler group":"redis.redis.opstreelabs.in","reconciler kind":"RedisSentinel","error":"failed to wait for redissentinel caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:208\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:234\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/manager/runnable_group.go:218"}
{"level":"info","ts":1711030955.1684356,"msg":"Stopping and waiting for caches"}
{"level":"error","ts":1711030955.1684442,"msg":"error received after stop sequence was engaged","error":"failed to wait for redissentinel caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/manager/internal.go:541"}
{"level":"info","ts":1711030955.1685617,"msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":1711030955.168597,"msg":"Wait completed, proceeding to shutdown the manager"}
{"level":"error","ts":1711030955.1686509,"logger":"setup","msg":"problem running manager","error":"failed to wait for redis caches to sync: timed out waiting for cache to be synced","stacktrace":"main.main\n\t/workspace/main.go:159\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255"}
{"level":"error","ts":1711030955.168694,"msg":"error received after stop sequence was engaged","error":"leader election lost","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/manager/internal.go:541"}
drivebyer commented 3 months ago

@lapete Thank you for the feedback!

I noticed an error in the log saying "server gave HTTP response to HTTPS client." It seems like the webhook certificate in the operator may not be set up properly. Could you please verify if there is a secret named webhook-server-cert created in the current namespace?

JuroOravec commented 3 months ago

I have the same issue and the secret webhook-server-cert is not present. I can confirm that the workaround with disabling webhooks works.

Now I'm fighting with the error Can't open or create append-only dir appendonlydir: Permission denied

Also, on the topic of issues with the webhook, it seems that the webhook is the only place where the redis-operator namespace is hardcoded. I initially wanted to use a different namespace, but then the webhook-service couldn't be found.

lapete commented 3 months ago

The secret isn't present in my case either.

drivebyer commented 3 months ago

@lapete Here's a method to deploy an operator with webhook enabled using cert-manager. You can find more details at https://github.com/OT-CONTAINER-KIT/helm-charts/tree/main/charts/redis-operator.

jurim76 commented 2 months ago

The webhook service namespace is hardcoded in CRD, but helm chart doesn't care about this namespace The webhook service is enabled, even with default webhook=false

omniproc commented 2 months ago

I have the same issue and the secret webhook-server-cert is not present. I can confirm that the workaround with disabling webhooks works.

Now I'm fighting with the error Can't open or create append-only dir appendonlydir: Permission denied

Also, on the topic of issues with the webhook, it seems that the webhook is the only place where the redis-operator namespace is hardcoded. I initially wanted to use a different namespace, but then the webhook-service couldn't be found.

Have you fixed that permission denied error? It seems to originate from here https://github.com/OT-CONTAINER-KIT/redis/blob/b3694110028644463d1477d689e5f529abe8616f/entrypoint.sh#L86

Persistence seems to be enabled by simply specifying the storage.volumeClaimTemplate option, which is the default with the latest chart. The volume is mounted but permissions are never applied. Workaround is to disable persistence, if that's an option for you.

arusa commented 2 months ago

Same problem here, the secret webhook-server-cert does not exist and I am seeing the same error messages

failed to list redis.redis.opstreelabs.in/v1beta1, Kind=RedisReplication: conversion webhook for redis.redis.opstreelabs.in/v1beta2, Kind=RedisReplication failed: Post \"https://webhook-service.redis-operator.svc:443/convert?timeout=30s\": service \"webhook-service\" not found