Closed ranferimeza closed 6 months ago
We've run on EKS 1.25 in the past, and our e2e tests run on 1.25 as well.
Can you try using the latest 1.19 chart?
We've run on EKS 1.25 in the past, and our e2e tests run on 1.25 as well.
Can you try using the latest 1.19 chart?
I'll try this, and report back
Same result, unfortunately. Same error message...
Can you set logging to debug and share the entire log? I can't reproduce this.
`time="2023-12-13T22:32:15Z" level=info msg=----------------------------------
time="2023-12-13T22:32:15Z" level=info msg="rbac-manager 1.7.0 running"
time="2023-12-13T22:32:15Z" level=info msg=----------------------------------
time="2023-12-13T22:32:15Z" level=info msg="Registering components"
time="2023-12-13T22:32:15Z" level=info msg="Watching resources related to RBAC Definitions"
time="2023-12-13T22:32:15Z" level=info msg="Watching RBAC Definitions"
[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed: goroutine 84 [running]:
runtime/debug.Stack() /usr/local/go/src/runtime/debug/stack.go:24 +0x65
sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot() /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/log/log.go:59 +0xbd
sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).Error(0xc00009eac0, {0x1c62700, 0xc0001a20c0}, {0x1a3d2aa, 0x21}, {0x0, 0x0, 0x0}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/log/deleg.go:139 +0x68
github.com/go-logr/logr.Logger.Error({{0x1c7ccd8?, 0xc00009eac0?}, 0x0?}, {0x1c62700, 0xc0001a20c0}, {0x1a3d2aa, 0x21}, {0x0, 0x0, 0x0}) /go/pkg/mod/github.com/go-logr/logr@v1.2.4/logr.go:299 +0xda
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1({0x1c7a038?, 0xc000184870?}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/source/kind.go:68 +0x1a5
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1(0xc000184870?, {0x1c7a038?, 0xc000184870?}) /go/pkg/mod/k8s.io/apimachinery@v0.27.3/pkg/util/wait/loop.go:62 +0x5d
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext({0x1c7a038, 0xc000184870}, {0x1c78d40?, 0xc0001a2a80}, 0x1, 0x0, 0x0?) /go/pkg/mod/k8s.io/apimachinery@v0.27.3/pkg/util/wait/loop.go:63 +0x205
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel({0x1c7a038, 0xc000184870}, 0x0?, 0x0?, 0x0?) /go/pkg/mod/k8s.io/apimachinery@v0.27.3/pkg/util/wait/poll.go:33 +0x5c
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1() /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/source/kind.go:56 +0xfa
created by sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/source/kind.go:48 +0x1e5
time="2023-12-13T22:34:45Z" level=error msg="[failed to wait for rbacdefinition caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.RBACDefinition, failed waiting for all runnables to end within grace period of 30s: context deadline exceeded]: unable to run the manager"`
Please, do let me know how to set logging for the rbac-manager to debug. Not really to deal with the apps within a K8s cluster, as I only build the infra and let people install stuff on it.
I don't think debug is going to give us a ton more info, but you should be able to set it by adding the helm value extraArgs=log-level=debug
It looks like in v1.6.0 we upgraded client-go from 0.26 to 0.27 which may have introduced an incompatibility with k8s 1.25 (which is End of Life at this point). So perhaps going back to 1.5.0 of rbac-manager (chart version 1.16.0) would work?
https://github.com/FairwindsOps/rbac-manager/releases/tag/v1.5.0 https://artifacthub.io/packages/helm/fairwinds-stable/rbac-manager/1.16.0
I'll try this and let you know. Thanks!
Update: downgraded rbac-manager to the suggested version, and now it fails due to a liveness probe failure... I'll go back to the latest version and enable logging to see if that helps figuring this out. Thanks!
liveness probe could be CPU throttling. As a matter of fact, so could your original issue. Is this a large cluster? What are your cpu/mem requests/limits?
Hi, the cluster is "large": 6 nodes total, separated in 3 node groups of 2 nodes each dedicated to separate workloads. The instances are m5.large for the "lesser" node groups and c5a.xlarge for the most important node group, where the main apps run. rbac-manager runs on one of the m5.large node groups, of course. There are no limits: we deploy the same configuration and instance types for 1.24 and we did not see this issue.
And, as you mentioned, debug did not help:
`time="2023-12-15T15:14:35Z" level=info msg=---------------------------------- time="2023-12-15T15:14:35Z" level=info msg="rbac-manager 1.7.0 running" time="2023-12-15T15:14:35Z" level=info msg=----------------------------------
time="2023-12-15T15:14:35Z" level=debug msg="Setting up client for manager"
time="2023-12-15T15:14:35Z" level=debug msg="Setting up manager"
time="2023-12-15T15:14:35Z" level=info msg="Registering components"
time="2023-12-15T15:14:35Z" level=debug msg="Setting up scheme"
time="2023-12-15T15:14:35Z" level=debug msg="Setting up controller"
time="2023-12-15T15:14:35Z" level=info msg="Watching resources related to RBAC Definitions"
time="2023-12-15T15:14:35Z" level=info msg="Watching RBAC Definitions"
[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed: goroutine 63 [running]:
runtime/debug.Stack() /usr/local/go/src/runtime/debug/stack.go:24 +0x65
sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot() /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/log/log.go:59 +0xbd
sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).Error(0xc000098fc0, {0x1c62700, 0xc000446620}, {0x1a3d2aa, 0x21}, {0x0, 0x0, 0x0}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/log/deleg.go:139 +0x68
github.com/go-logr/logr.Logger.Error({{0x1c7ccd8?, 0xc000098fc0?}, 0x0?}, {0x1c62700, 0xc000446620}, {0x1a3d2aa, 0x21}, {0x0, 0x0, 0x0}) /go/pkg/mod/github.com/go-logr/logr@v1.2.4/logr.go:299 +0xda
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1({0x1c7a038?, 0xc00039e190?}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/source/kind.go:68 +0x1a5
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1(0xc00039e190?, {0x1c7a038?, 0xc00039e190?}) /go/pkg/mod/k8s.io/apimachinery@v0.27.3/pkg/util/wait/loop.go:62 +0x5d
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext({0x1c7a038, 0xc00039e190}, {0x1c78d40?, 0xc0001edde0}, 0x1, 0x0, 0x0?) /go/pkg/mod/k8s.io/apimachinery@v0.27.3/pkg/util/wait/loop.go:63 +0x205
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel({0x1c7a038, 0xc00039e190}, 0x0?, 0x0?, 0x0?) /go/pkg/mod/k8s.io/apimachinery@v0.27.3/pkg/util/wait/poll.go:33 +0x5c
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1() /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/source/kind.go:56 +0xfa
created by sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/source/kind.go:48 +0x1e5
time="2023-12-15T15:17:05Z" level=error msg="[failed to wait for namespace caches to sync: timed out waiting for cache to be synced for Kind *v1.Namespace, failed waiting for all runnables to end within grace period of 30s: context deadline exceeded]: unable to run the manager"`
What happened?
rbac-manager is stuck in a crashloopbackoff error after showing the following error on EKS 1.25. It was running fine on EKS 1.24:
time="2023-12-13T19:59:59Z" level=error msg="[failed to wait for rbacdefinition caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.RBACDefinition, failed waiting for all runnables to end within grace period of 30s: context deadline exceeded]: unable to run the manager"
I see the CRDs are created, so I am unable to identify what is causing this problem.
What did you expect to happen?
rbac-manager to run without issues.
How can we reproduce this?
Just try to install rbac-manager using helmsman. I only supply tolerations to make it run on a specific node group, but the rest of the values supplied are the chart defaults.
Version
rbac-manager-1.18.0
Search
Code of Conduct
Additional context
No response