Open nampnguyen opened 4 months ago
@nampnguyen Is this happening only during controller startup, or have you observed it persisting even after multiple reconciliations?
We can change a few things in the ACK runtime to mitigate this:
CARM
ShareInformers
to sync, before spinning the reconcilers.Quick note: We really need to start stress and load testing for ACK controllers, especially for components like CARM watchers.. i'm pretty sure there is a lot we can catch in there.
@a-hilaly In our experience, if the reconciler starts before the ack-role-account-map
ConfigMap is read, the issue persists for the life of the controller. The only option is to restart the controller and roll the dice again.
When we originally saw this issue when we had a CPU limit of 250m
, but then increased it to 2000m
. After the limit increase, the frequency of reconciler errors went down significantly. We implemented a cronjob to check for reconciler errors and restarted the controller if that happened which took care of the few times it popped back up. However, since that time the number of namespaces and accounts used by ACK has grown and we now see some controllers continue to experience reconciler errors even after dozens of restarts.
@nampnguyen we starting rolling out a fix for all the controllers - i'll ping here once dynamodb controller is patched.
@a-hilaly Saw the controllers PRs that included the 0.31.0
runtime get closed, is the fix being re-worked? Thanks!
@nampnguyen Looks like we also needed https://github.com/aws-controllers-k8s/runtime/commit/861f7ed8ee62985c9a1ec7dd5e6a2a47de565e6c - we're rolling a new patch today :)
@a-hilaly We tried the latest controllers using runtime 0.32.0
, however are still seeing the issue. Debug logs from iam-controller 1.3.5
:
{"level":"info","ts":"2024-03-11T13:47:53.770Z","logger":"setup","msg":"initializing service controller","aws.service":"iam"}
{"level":"debug","ts":"2024-03-11T13:47:53.770Z","logger":"cache.account","msg":"Starting shared informer for accounts cache","targetConfigMap":"ack-role-account-map"}
{"level":"debug","ts":"2024-03-11T13:47:53.770Z","logger":"cache.namespace","msg":"Starting namespace cache","watchScope":[],"ignored":["kube-system","kube-public","kube-node-lease"]}
{"level":"debug","ts":"2024-03-11T13:47:53.788Z","logger":"cache.namespace","msg":"created namespace","name":"namespace-1"}
{"level":"debug","ts":"2024-03-11T13:47:53.788Z","logger":"cache.namespace","msg":"created namespace","name":"namespace-13"}
{"level":"debug","ts":"2024-03-11T13:47:53.816Z","logger":"ackrt","msg":"Initiating reconciler","reconciler kind":"OpenIDConnectProvider","resync period seconds":36000}
{"level":"debug","ts":"2024-03-11T13:47:53.816Z","logger":"ackrt","msg":"Initiating reconciler","reconciler kind":"Policy","resync period seconds":36000}
{"level":"debug","ts":"2024-03-11T13:47:53.816Z","logger":"ackrt","msg":"Initiating reconciler","reconciler kind":"Role","resync period seconds":36000}
{"level":"debug","ts":"2024-03-11T13:47:53.817Z","logger":"ackrt","msg":"Initiating reconciler","reconciler kind":"User","resync period seconds":36000}
{"level":"debug","ts":"2024-03-11T13:47:53.817Z","logger":"ackrt","msg":"Initiating reconciler","reconciler kind":"Group","resync period seconds":36000}
{"level":"debug","ts":"2024-03-11T13:47:53.817Z","logger":"ackrt","msg":"Initiating reconciler","reconciler kind":"InstanceProfile","resync period seconds":36000}
{"level":"info","ts":"2024-03-11T13:47:53.817Z","logger":"setup","msg":"starting manager","aws.service":"iam"}
...
{"level":"info","ts":"2024-03-11T13:47:53.928Z","msg":"Starting workers","controller":"policy","controllerGroup":"iam.services.k8s.aws","controllerKind":"Policy","worker count":1}
{"level":"debug","ts":"2024-03-11T13:47:53.936Z","logger":"exporter.field-export-reconciler","msg":"error did not need requeue","error":"the source resource is not synced yet"}
{"level":"debug","ts":"2024-03-11T13:47:53.938Z","logger":"ackrt","msg":"> r.Sync","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","generation":4}
{"level":"debug","ts":"2024-03-11T13:47:53.938Z","logger":"ackrt","msg":">> r.resetConditions","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","generation":4}
{"level":"debug","ts":"2024-03-11T13:47:53.938Z","logger":"ackrt","msg":"<< r.resetConditions","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","generation":4}
{"level":"debug","ts":"2024-03-11T13:47:53.938Z","logger":"ackrt","msg":">> rm.ResolveReferences","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:53.938Z","logger":"ackrt","msg":"<< rm.ResolveReferences","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:53.938Z","logger":"ackrt","msg":">> rm.EnsureTags","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:53.938Z","logger":"ackrt","msg":"<< rm.EnsureTags","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:53.938Z","logger":"ackrt","msg":">> rm.ReadOne","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:53.938Z","logger":"ackrt","msg":">>> rm.sdkFind","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:53.944Z","logger":"exporter.field-export-reconciler","msg":"error did not need requeue","error":"the source resource is not synced yet"}
// Config map read here
{"level":"debug","ts":"2024-03-11T13:47:53.972Z","logger":"cache.account","msg":"created account config map","name":"ack-role-account-map"}
// Role is empty
{"level":"debug","ts":"2024-03-11T13:47:54.016Z","logger":"ackrt","msg":"<<< rm.sdkFind","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4,"error":"AccessDenied: User: arn:aws:sts::123456789012:assumed-role/ACKControlerIRSARole/1710164873938759450 is not authorized to perform: iam:GetRole on resource: role ACKManagedRole1 because no identity-based policy allows the iam:GetRole action\n\tstatus code: 403, request id: 899dd5aa-2b83-49b1-87cf-818457adce10"}
{"level":"debug","ts":"2024-03-11T13:47:54.016Z","logger":"ackrt","msg":"<< rm.ReadOne","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4,"error":"AccessDenied: User: arn:aws:sts::123456789012:assumed-role/ACKControlerIRSARole/1710164873938759450 is not authorized to perform: iam:GetRole on resource: role ACKManagedRole1 because no identity-based policy allows the iam:GetRole action\n\tstatus code: 403, request id: 899dd5aa-2b83-49b1-87cf-818457adce10"}
{"level":"debug","ts":"2024-03-11T13:47:54.016Z","logger":"ackrt","msg":">> r.ensureConditions","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.016Z","logger":"ackrt","msg":">>> rm.IsSynced","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.016Z","logger":"ackrt","msg":"<<< rm.IsSynced","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.016Z","logger":"ackrt","msg":"<< r.ensureConditions","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.016Z","logger":"ackrt","msg":"< r.Sync","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4,"error":"AccessDenied: User: arn:aws:sts::123456789012:assumed-role/ACKControlerIRSARole/1710164873938759450 is not authorized to perform: iam:GetRole on resource: role ACKManagedRole1 because no identity-based policy allows the iam:GetRole action\n\tstatus code: 403, request id: 899dd5aa-2b83-49b1-87cf-818457adce10"}
{"level":"debug","ts":"2024-03-11T13:47:54.016Z","logger":"ackrt","msg":"> r.patchResourceStatus","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.016Z","logger":"ackrt","msg":">> kc.Patch (status)","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.038Z","logger":"ackrt","msg":"patched resource status","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4,"json":"{\"metadata\":{\"resourceVersion\":\"462345426\"},\"spec\":{...},\"status\":{\"conditions\":[{\"message\":\"AccessDenied: User: arn:aws:sts::123456789012:assumed-role/ACKControlerIRSARole/1710164873938759450 is not authorized to perform: iam:GetRole on resource: role ACKManagedRole1 because no identity-based policy allows the iam:GetRole action\\n\\tstatus code: 403, request id: 899dd5aa-2b83-49b1-87cf-818457adce10\",\"status\":\"True\",\"type\":\"ACK.Recoverable\"},{\"lastTransitionTime\":\"2024-03-11T13:47:54Z\",\"message\":\"Unable to determine if desired resource state matches latest observed state\",\"reason\":\"AccessDenied: User: arn:aws:sts::123456789012:assumed-role/ACKControlerIRSARole/1710164873938759450 is not authorized to perform: iam:GetRole on resource: role ACKManagedRole1 because no identity-based policy allows the iam:GetRole action\\n\\tstatus code: 403, request id: 899dd5aa-2b83-49b1-87cf-818457adce10\",\"status\":\"Unknown\",\"type\":\"ACK.ResourceSynced\"}]}}"}
{"level":"debug","ts":"2024-03-11T13:47:54.038Z","logger":"ackrt","msg":"<< kc.Patch (status)","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.038Z","logger":"ackrt","msg":"< r.patchResourceStatus","kind":"Role","namespace":"namespace-1","name":"iam-role-1","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"error","ts":"2024-03-11T13:47:54.038Z","msg":"Reconciler error","controller":"role","controllerGroup":"iam.services.k8s.aws","controllerKind":"Role","Role":{"name":"iam-role-1","namespace":"namespace-1"},"namespace":"namespace-1","name":"iam-role-1","reconcileID":"9d3c3474-c71e-4a60-a544-efe1bfe759c9","error":"AccessDenied: User: arn:aws:sts::123456789012:assumed-role/ACKControlerIRSARole/1710164873938759450 is not authorized to perform: iam:GetRole on resource: role ACKManagedRole1 because no identity-based policy allows the iam:GetRole action\n\tstatus code: 403, request id: 899dd5aa-2b83-49b1-87cf-818457adce10","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227"}
{"level":"debug","ts":"2024-03-11T13:47:54.044Z","logger":"exporter.field-export-reconciler","msg":"error did not need requeue","error":"the source resource is not synced yet"}
// Role is empty
{"level":"debug","ts":"2024-03-11T13:47:54.044Z","logger":"ackrt","msg":"> r.Sync","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.044Z","logger":"ackrt","msg":">> r.resetConditions","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.044Z","logger":"ackrt","msg":"<< r.resetConditions","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.044Z","logger":"ackrt","msg":">> rm.ResolveReferences","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.044Z","logger":"ackrt","msg":"<< rm.ResolveReferences","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.044Z","logger":"ackrt","msg":">> rm.EnsureTags","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.044Z","logger":"ackrt","msg":"<< rm.EnsureTags","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.044Z","logger":"ackrt","msg":">> rm.ReadOne","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.044Z","logger":"ackrt","msg":">>> rm.sdkFind","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.074Z","logger":"ackrt","msg":"<<< rm.sdkFind","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4,"error":"AccessDenied: User: arn:aws:sts::123456789012:assumed-role/ACKControlerIRSARole/1710164873938759450 is not authorized to perform: iam:GetRole on resource: role ACKManagedRole2 because no identity-based policy allows the iam:GetRole action\n\tstatus code: 403, request id: ad1e102f-2db9-488a-9dbe-6b1894a15626"}
{"level":"debug","ts":"2024-03-11T13:47:54.074Z","logger":"ackrt","msg":"<< rm.ReadOne","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4,"error":"AccessDenied: User: arn:aws:sts::123456789012:assumed-role/ACKControlerIRSARole/1710164873938759450 is not authorized to perform: iam:GetRole on resource: role ACKManagedRole2 because no identity-based policy allows the iam:GetRole action\n\tstatus code: 403, request id: ad1e102f-2db9-488a-9dbe-6b1894a15626"}
{"level":"debug","ts":"2024-03-11T13:47:54.074Z","logger":"ackrt","msg":">> r.ensureConditions","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.074Z","logger":"ackrt","msg":">>> rm.IsSynced","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.074Z","logger":"ackrt","msg":"<<< rm.IsSynced","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.074Z","logger":"ackrt","msg":"<< r.ensureConditions","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.074Z","logger":"ackrt","msg":"< r.Sync","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4,"error":"AccessDenied: User: arn:aws:sts::123456789012:assumed-role/ACKControlerIRSARole/1710164873938759450 is not authorized to perform: iam:GetRole on resource: role ACKManagedRole2 because no identity-based policy allows the iam:GetRole action\n\tstatus code: 403, request id: ad1e102f-2db9-488a-9dbe-6b1894a15626"}
{"level":"debug","ts":"2024-03-11T13:47:54.074Z","logger":"ackrt","msg":"> r.patchResourceStatus","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.074Z","logger":"ackrt","msg":">> kc.Patch (status)","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.091Z","logger":"ackrt","msg":"patched resource status","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4,"json":"{\"metadata\":{\"resourceVersion\":\"462345427\"},\"spec\":{...},\"status\":{\"conditions\":[{\"message\":\"AccessDenied: User: arn:aws:sts::123456789012:assumed-role/ACKControlerIRSARole/1710164873938759450 is not authorized to perform: iam:GetRole on resource: role ACKManagedRole2 because no identity-based policy allows the iam:GetRole action\\n\\tstatus code: 403, request id: ad1e102f-2db9-488a-9dbe-6b1894a15626\",\"status\":\"True\",\"type\":\"ACK.Recoverable\"},{\"lastTransitionTime\":\"2024-03-11T13:47:54Z\",\"message\":\"Unable to determine if desired resource state matches latest observed state\",\"reason\":\"AccessDenied: User: arn:aws:sts::123456789012:assumed-role/ACKControlerIRSARole/1710164873938759450 is not authorized to perform: iam:GetRole on resource: role ACKManagedRole2 because no identity-based policy allows the iam:GetRole action\\n\\tstatus code: 403, request id: ad1e102f-2db9-488a-9dbe-6b1894a15626\",\"status\":\"Unknown\",\"type\":\"ACK.ResourceSynced\"}]}}"}
{"level":"debug","ts":"2024-03-11T13:47:54.091Z","logger":"ackrt","msg":"<< kc.Patch (status)","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"debug","ts":"2024-03-11T13:47:54.091Z","logger":"ackrt","msg":"< r.patchResourceStatus","kind":"Role","namespace":"namespace-2","name":"iam-role-2","account":"123456789012","role":"","region":"us-east-1","is_adopted":false,"generation":4}
{"level":"error","ts":"2024-03-11T13:47:54.091Z","msg":"Reconciler error","controller":"role","controllerGroup":"iam.services.k8s.aws","controllerKind":"Role","Role":{"name":"iam-role-2","namespace":"namespace-2"},"namespace":"namespace-2","name":"iam-role-2","reconcileID":"58482afd-89b1-45eb-8630-724deacf338c","error":"AccessDenied: User: arn:aws:sts::123456789012:assumed-role/ACKControlerIRSARole/1710164873938759450 is not authorized to perform: iam:GetRole on resource: role ACKManagedRole2 because no identity-based policy allows the iam:GetRole action\n\tstatus code: 403, request id: ad1e102f-2db9-488a-9dbe-6b1894a15626","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227"}
Even waiting several minutes, I'm not seeing the re-queuing behavior described in the PRs.
@nampnguyen I will try to reproduce locally, maybe i'm missing something in here...
qq: does the namespace have an owner-account-id
annotation? the only case where controllers will skip the requeue is if the namespaces aren't annotated..
@a-hilaly Yes, the namespaces do have the services.k8s.aws/owner-account-id
annotation. Looking at the fix, I think the issue we're currently experiencing may have been introduced by https://github.com/aws-controllers-k8s/runtime/commit/24070d995cf503d77224b15771aa3c938fe6062c
If I'm reading this correctly, this line here assumes that if the resource's Account ID is the same as the controller's IRSA account ID, then CARM is not used.
In the most recent logs I shared, the resource's Account ID and Controller account are in fact the same, but we still need the CARM role pivot. I think this can be fixed by always checking for the namespace annotation and if present use CARM.
@nampnguyen makes sense~ working on a fix! I think we might also want to enable CARM even in single-namespace watch mode.
If I'm reading this correctly, this line here assumes that if the resource's Account ID is the same as the controller's IRSA account ID, then CARM is not used.
Correct. Not sure, but this might have been the case even before.
@a-hilaly @nampnguyen We are also facing this exact issue when using most recent version of iam-controller. Glad to hear a fix is being worked on. Thanks!
@a-hilaly we do still face this same issue even after your pull request from April. Just wondering if someone is looking at this problem still?
@mattzech @mumlawski @nampnguyen Sorry folks I somehow got side tracked from this issue. This is a high priority issue and I i'll ship a fix for it ASAP.
Describe the bug When a controller starts while using CARM, particularly if the pod has a CPU limit specified (~250m), the account config map can sometimes be read after reconciliation starts. When this happens, the controller no longer assumes the appropriate cross account role and results in AccessDenied errors because the IRSA role does not have permissions to resources in other accounts.
The problem seems to be getting worse over time and may be related to the number of namespaces.
Even after the
ack-role-account-map
is read, the controller does not use the correct cross account role.Debug logs (with some redactions):
Steps to reproduce
Expected outcome Resource resyncing to wait until after the
ack-role-account-map
ConfigMap is read or future resyncs pivots to the cross account role if the ConfigMap is read after resyncing starts.Environment