awslabs / aws-orbit-workbench

A Data Platform built for AWS, powered by Kubernetes.
https://awslabs.github.io/aws-orbit-workbench/
Apache License 2.0
127 stars 26 forks source link

[BUG] - Policy limit exceeded #1288

Closed rb201 closed 2 years ago

rb201 commented 2 years ago

Currently occurring in the nightly deploy env


[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]     deploy_env(env_name=env_name, manifest_dir=manifest_dir)
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]   File "/root/.venv/lib/python3.7/site-packages/aws_codeseeder/codeseeder.py", line 199, in wrapper
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]     return func(*args, **kwargs)
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]   File "/root/.venv/lib/python3.7/site-packages/aws_orbit/remote_files/deploy.py", line 371, in deploy_env
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]     changeset=changeset,
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]   File "/root/.venv/lib/python3.7/site-packages/aws_orbit/remote_files/eksctl.py", line 494, in deploy_env
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]     f"{context.eks_oidc_provider}:sub": f"system:serviceaccount:orbit-system:orbit-{context.name}-admin"
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]   File "/root/.venv/lib/python3.7/site-packages/aws_orbit/services/iam.py", line 110, in add_assume_role_statement
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]     iam_client.update_assume_role_policy(RoleName=role_name, PolicyDocument=policy_body)
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]   File "/root/.venv/lib/python3.7/site-packages/botocore/client.py", line 388, in _api_call
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]     return self._make_api_call(operation_name, kwargs)
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]   File "/root/.venv/lib/python3.7/site-packages/botocore/client.py", line 708, in _make_api_call
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD]     raise error_class(parsed_response, operation_name)
[2021-12-28 03:40:42,188][_remote.py   : 30] [CODEBUILD] botocore.errorfactory.LimitExceededException: An error occurred (LimitExceeded) when calling the UpdateAssumeRolePolicy operation: Cannot exceed quota for ACLSizePerRole: 2048```
dgraeber commented 2 years ago

When I look at this role (orbit-nightly-us-west-2-admin) I see that the trust policy has 4 entries for OIDC to have Assume RoleWithWebIdentity. I believe there should only be one....for the active cluster.

dgraeber commented 2 years ago

Looking at the code, the command orbit toolkit creates the IAM admin role. Once created, the orbit deploy env updates the role's trust policy with OIDC of the cluster.

When we delete the env (the cluster) and create a new one, the trust policy is updated with the new cluster OIDC, but the references to the deleted cluster are NOT removed from the policy.

: recycling the env multiple times fills up the space in the trust policy