Closed leelalagudu closed 2 years ago
@jmejco , @tamara-h , the PR's security checks would fail as the CI build project's config file does not yet have SupportNestedOU
parameter. Please update the CI repository with this config value, and re-trigger the security checks build project.
Fails to self-mutate between current version in master and the new version because, in the OrgMain account, on updating the CF stack env-aws-sso-extensions-for-enterprise-orgEventsProcessorStack
, envprocessTargetAccountSM
fails because Resource handler returned message: "'arn:aws:iam::accountid:role/env-processTargetAccountSMRole' is not authorized to create managed-rule. (Service: AWSStepFunctions; Status Code: 400; Error Code: AccessDeniedException; Request ID: ... guid; ... etc
It's really a pain when the self-mutation fails :(
It created drift, but I manually attached the AWS managed policy CloudWatchEventsFullAccess
to the env-processTargetAccountSMRole
to resolve this issue.
It created drift, but I manually attached the AWS managed policy
CloudWatchEventsFullAccess
to theenv-processTargetAccountSMRole
to resolve this issue.
I am not sure why the self-mutate is'nt picking this up. I was aware of this issue already, and had speicifically added https://github.com/aws-samples/aws-sso-extensions-for-enterprise/blob/8647fa894da05dd3de2d37709cc9896926ce4ff8/lib/stacks/pipelineStageStacks/org-events-processor.ts#L231 to handle this access issue.
@allquixotic , you are on the latest commit for main in 3.1.4 right? And, upgraded to my branch for 3.1.5? Also, in your deployment in orgmain, can you look at env-processTargetAccountSMRole
and validate if a custom policy with the permissions described in the code snippeet above has been created? I wanted to see if the policy itself is not creating?
@jmejco , can you please validate the behaviour your end and see if the self-mutate picks up the IAM poilcy as part of the change set and applies it?
OK... unless I'm missing something, this PR actually makes my problem with S3 permission sets worse. Now I'm getting must have required property 'Sid'
and must match a schema in anyOf
for every statement in every inlinePolicyDocument. These are valid permission sets that used to work even in 3.1.4.
Sid
is not a required property of an IAM policy statement, either in IAM itself, or in AWS SSO permission sets' inline policies.
Also, I'm still getting ThrottlingException
s on S3 links_data
updates. I am pushing a lot of them at once during a migration (somewhere around 300+) so maybe my test scenario is unusual, but the only way I can get through the throttling exceptions is to sleep for several seconds between each S3 upload.
@allquixotic , you are on the latest commit for main in 3.1.4 right?
Previously, yes.
And, upgraded to my branch for 3.1.5?
Yes.
Also, in your deployment in orgmain, can you look at
env-processTargetAccountSMRole
and validate if a custom policy with the permissions described in the code snippeet above has been created? I wanted to see if the policy itself is not creating?
Those permissions do not exist in envprocessTargetAccountSMRoleDefaultPolicy...
and there is no other inline or managed policy attached to env-processTargetAccountSMRole
.
The recent fixes (current as of commit id 2958f9d) get rid of all the errors I’ve been seeing. However, when updating an existing permission set that had no inline policy and no managed policies, when I try to add AWS managed policies in the S3 permission set, it doesn’t provide any error over SNS, but it doesn’t update.
Here is my permission set data:
{
“permissionSetName”: “foo”, “sessionDurationInMinutes”: “720”,
“managedPoliciesArnList”: [“arn:aws:iam::aws:policy/AdministratorAccess”, “arn:aws:iam::aws:policy/AWSSupportAccess”],
“inlinePolicyDocument”: {},
“relayState”: “https://us-east-1.console.aws.amazon.com/console/home?region=us-east-1#”,
“tags”: [ { “Key”: “something”, “Value”: “someval” } ]
}
The previously existing permission set in AWS SSO has zero managed policies and no inline policy.
Also, even after the latest commit (2958f9d), I still get rate limit exceeded when pushing to S3 many (new) permission sets, permission set updates, new links data, or links data updates. I have to sleep for about 9 seconds between S3 PutObject calls to keep it from getting rate limited.
For clarification, I’m not getting rate limited by S3; it’s the Lambda functions within SSO Extensions that get rate limited by AWS SSO, far as I can tell.
Hi @allquixotic ,
I had re-factored the queueing design (now taking into consideration that Lambda-SQS trigger does not really abide by concurrency limits you set up for the lambda) such that using both a batch size of 1, along with message group ID's ranging between 0 and 9 (take the last digit of the account) and the queue being FIFO, this should theoretically stay within the 10 live calls threshold at any given time.
Rather than hyptohesising this further, we will be setting up load test environment with approximately ~50 accounts and validate /fix the behaviour.
Regarding the permission set scenario you described, using the same commit ID, I first created a PS with empty inline policy and managed policies array, and then updated it to reflect this:
{ "permissionSetName": "SysAdmin-ps", "sessionDurationInMinutes": "240", "managedPoliciesArnList": ["arn:aws:iam::aws:policy/AdministratorAccess", "arn:aws:iam::aws:policy/AWSSupportAccess"], "inlinePolicyDocument": {}, "relayState": "https://eu-west-1.console.aws.amazon.com/systems-manager/managed-instances?region=eu-west-1#", "tags": [ { "Key": "versionid", "Value": "01" }, { "Key": "team", "Value": "SysAdmins" } ] }
, and this change propagated successfully in my environment and I could see the change trickle down through SSO. Can you please check this once again? Only because I am unable to re-produce this locally
Regarding the permission set scenario you described, using the same commit ID, I first created a PS with empty inline policy and managed policies array, and then updated it to reflect this:
...
, and this change propagated successfully in my environment and I could see the change trickle down through SSO. Can you please check this once again? Only because I am unable to re-produce this locally
I tried it again, and it still does not attach the managed policies. It doesn't throw any error over SNS either. Is there some specific function I should look for logs?
Regarding the permission set scenario you described, using the same commit ID, I first created a PS with empty inline policy and managed policies array, and then updated it to reflect this:
...
, and this change propagated successfully in my environment and I could see the change trickle down through SSO. Can you please check this once again? Only because I am unable to re-produce this locally
I tried it again, and it still does not attach the managed policies. It doesn't throw any error over SNS either. Is there some specific function I should look for logs?
@allquixotic , when you are provisioning the changed permission set, can you see what's happening from logs in the log group /aws/lambda/env-permissionSetTopicProcessor
please? This is where the delta is calculated and processed. If there are no logs populated here when you are pushing the updated PS through S3, can you check what's happening in the log group /aws/lambda/env-psCuHandler
please? This is the user interface handler, and we will know if there's an issue at this layer.
Thank you, Leela
We have tested the Nested OU and account assignment to orgs with over 40 accounts and that is all working. An issue that did arise was doing account assignment at the root level with 126 accounts. This fails in the step function with the error
Error - Organizations.InvalidInputException Cause - You specified an invalid value for nextToken. You must get the value from the response to a previous call to the API. (Service: Organizations, Status Code: 400, Request ID: 8979fd76-2be8-4ba7-9dc5-c1c5de413487)
Root mapping is only mapping to five accounts, no error in lambda or no error email. When the step function was checked the error above was noticed.
Mapping file - root%all%aws-aes-engg-sso-ps%aws-aes-engg-sso%GROUP%ssofile
step input for failing step below. I replaced the sensitive details
{ "action": "create", "entityType": "root", "instanceArn": "<instanceArn>", "permissionSetArn": "<instanceArn>", "principalId": "<principalId>", "principalType": "GROUP", "targetType": "AWS_ACCOUNT", "topicArn": "<topicArn>", "pageSize": 5, "waitSeconds": 2, "supportNestedOU": "true", "tagKey": "", "emptyString": "", "tagValues": "", "ou_id": "", "resourceTypeFilters": "organizations:account", "listRootAccounts": { "Accounts": [ { "Arn": "<arn>", "Email": "<email>", "Id": "<id>", "JoinedMethod": "CREATED", "JoinedTimestamp": "2021-03-01T12:11:33.059Z", "Name": "<name>", "Status": "ACTIVE" }, { "Arn": "<arn>", "Email": "<email>", "Id": "<id>", "JoinedMethod": "CREATED", "JoinedTimestamp": "2021-08-09T13:14:55.607Z", "Name": "<name>", "Status": "ACTIVE" }, { "Arn": "<arn>", "Email": "<email>", "Id": "<id>", "JoinedMethod": "CREATED", "JoinedTimestamp": "2021-08-14T19:10:37.341Z", "Name": "<name>", "Status": "ACTIVE" }, { "Arn": "<arn>", "Email": "<email>", "Id": "<id>", "JoinedMethod": "CREATED", "JoinedTimestamp": "2021-04-28T11:13:22.576Z", "Name": "<name>", "Status": "ACTIVE" }, { "Arn": "<arn>", "Email": "<email>", "Id": "<id>", "JoinedMethod": "CREATED", "JoinedTimestamp": "2021-03-01T12:08:52.478Z", "Name": "<name>", "Status": "ACTIVE" } ], "NextToken": "H4sIAAAAAAAAADVU3U6aQRB9F65709s+R1+gSbkwbWqiiRdtmqCiIr8qWP8QRSgggooKouzMPNJ5hc4c0mSymT17dvbMxxl+ZTaya+srqz8ynz5+yKxn11a+fF/5mf36efVb1rEM7AX6BPPow9pc32Fd6BR6A7uBzoNj98QNNoHtwy5g27ATWAVWhBVIKENL0AOoEy6hHlewFqwJO4XtwTZhVVgd2oBeQM+Jn3Gts8gBrBTX9S/sGvYG68GmlDejAI8FTGGJp3MiTn4mOIH2oW3oK7QDHUOH5LTjVJvQd+gCuhNhOWiLj+4ROQo9EZ6fkVnglSa3rUjskPrzMFd+DPV+vUiRdVpskHx5gIyQRpAbyC2kC0mQAaSH1IfccWs8XeITbhckTyFtbueQGeSFuIMCeWf+BNmB1CB/kCqQBiQP2YRUIbuQAqQF2YJsR6Qc5B5ShxxBLkk4J9kraFSIJ/xKk6dOO4mj1EaaIt1FC2mIdByy0zXxPnFD6sTqbUonJKVHpAnXBZlzEg6RhI2047mUWHASGlxYlD0ItckbOUJqUGSd0aLOKyYXbHAfUqTsCtcyb5WQikg15mcRfte/iT3AxuFkHfGnv4M9/nfCgm4ZwG5pGKGT3dId2mxG1zWYu+dfYSMmXrBGc5ajVKx5aIXeuw0jhbGfWaTNmhXmbtFdGqbAF7usP2aS41AMeDqhnU75LsckDOzT0WW1ARu5ht4HogJ9gL7RliNoIu6gT3EPOuPao+ZxyFO/PoyIWZhwLoa0utfPU/bS/45U6fwy3X4V3elyQJYDnqPtd5lscfar/B8o8WtskVaDbmd+/wMn6ghzcgQAAA==" } }
Hi @jjleigh ,
We're unable to reproduce the root issue that you are seeing with the step functions. We've tried the below use cases in 2 of our dev environments and are unable to reproduce the issue. @jmejco had done a stellar job of setting up an AWS org with ~ 60 accounts , so that we could stress test the solution and document processing teams for average loads. These tests were done as part of this validation
root
scope and did not see the exception you're seeingroot
scope and did not see the exception you're seeingroot
scope assignment with pageSize reduced to 1, and the step function looped through 6 times using nextToken parameter successfully.As this is specific to your environment, could you please validate the following:
root
scope account assignment and see if you're seeing the same exception. If you are , can you take the nextToken value from the execution payload and run a CLI command with that to see if that works. For ex aws organizations list-accounts --page-size 5 --next-token <your token from execution payload>
? Hi @allquixotic , we are in the process of doing more stress tests on the solution, but so far we've validated ~ 300 account assignment create/delete operations and ensured that the throttle tuning works with 0 failures.
If it's feasible for you , could you try the same bulk load flow you were trying earlier with the latest commit ID please? Only if that's feasible for you.
Thank you, Leela
Hi @jjleigh ,
We're unable to reproduce the root issue that you are seeing with the step functions. We've tried the below use cases in 2 of our dev environments and are unable to reproduce the issue. @jmejco had done a stellar job of setting up an AWS org with ~ 60 accounts , so that we could stress test the solution and document processing teams for average loads. These tests were done as part of this validation
- He triggered 300 create account assignments in parallel using
root
scope and did not see the exception you're seeing- He then triggered 300 delete acccount assignments in parallel using
root
scope and did not see the exception you're seeing- On a much smaller environment ~ 6 accounts , I used the same permission set name and group name that you're using and triggered
root
scope assignment with pageSize reduced to 1, and the step function looped through 6 times using nextToken parameter successfully.As this is specific to your environment, could you please validate the following:
- Re-run a
root
scope account assignment and see if you're seeing the same exception. If you are , can you take the nextToken value from the execution payload and run a CLI command with that to see if that works. For exaws organizations list-accounts --page-size 5 --next-token <your token from execution payload>
?
We tried this and got a message saying: Cannot specify --no-paginate along with pagination arguements: --page-size
so we tried without the page size flag and got the same error from the step function that I shared above.
@jjleigh @leelalagudu
The correct argument to provide the NextToken
in the CLI is --starting-token
.
Hi, with the latest commits, I uploaded one item to links_data
per second and received no rate limit issues. This is much improved from previous commits.
However, I still can reproduce two issues that may be outside the scope of this PR:
When deleting a permission set attached to an account, I get an SNS error email saying "Cannot delete permission set as there are existing links that reference the permission set". Why can't it remove those links before deleting it?
When uploading a new copy of the deleted permission set (deleted from S3, not from SSO) with AWS managed policies, it doesn't add the AWS managed policies.
There's also a race condition, where, while waiting for accounts to be deprovisioned (which can take minutes if you have dozens of accounts provisioned for the same permission set), trying to delete the permission set won't work.
So to do something like clear the links_data and permission_set data in S3 for a given permission set, you first have to delete the links_data, wait, then once it's deprovisioned, delete the permission_set file.
Hi @allquixotic ,
Thank you, Leela
I'm OK with most of what you wrote and agree, but I tried something completely new:
First, I deleted my permission set (after removing all the account links -- I had to do 9 of them by hand, because for some reason, deleting all the links data relative to this permset didn't delete all the account links, and none of them were established by hand)
Once confirming the permission set did not exist at all in AWS SSO, I re-uploaded the .json into permission_sets, and the account links into links_data. The link was root%all%PermSet-ps%GroupNameHere%GROUP%ssofile
.
Then, confusingly, I neither received an SNS error notification about this, nor did the permission set get created.
So, something is failing at a low level with this particular permission set, so that it just won't create it, and won't tell me why it can't via the usual error email.
Here is the permission set verbatim, except that something
in the permissionSetName is replaced with 16 lowercase and uppercase letters that are specific to our business case.
{
"permissionSetName": "something",
"sessionDurationInMinutes": "720",
"managedPoliciesArnList": [
"arn:aws:iam::aws:policy/AdministratorAccess",
"arn:aws:iam::aws:policy/AWSSupportAccess"
],
"inlinePolicyDocument": {},
"relayState": "https://us-east-1.console.aws.amazon.com/console/home?region=us-east-1#",
"tags": [
{
"Key": "versionid",
"Value": "01"
}
]
}
I'll note that SSO Extensions does create and provision many other permission sets successfully, but this specific one just doesn't work for some reason. I'll check the log groups you suggested earlier when I have a chance.
Thank you for the updates @allquixotic . When you do get a chance, please have a look at those log files and see what's happening with this specific permission set. If the issue persists, please can you create a new issue so that this could be tracked separately.
For now, the PR would be approved and merged in the next few days as all the target stories for the PR are now handled. Do let us know if you still see issues that are in the scope of this PR i.e. nested OU support (both for provisioning & self sustaining flows), handle throttling for any organizations load, updated permission set schema definitions to align with standard AWS SSO admin API.
Thank you, Leela
Thanks, I agree, this PR seems to resolve the issues it claims to. No need to hold the PR for other unrelated issues. Just reporting the results of my testing.
Tell you what; to make it easier to forget about the messages in this PR comments, I'll open a new issue with my findings about that bad permission set that isn't being created or updated.
Issue #, if available: Fixes #88, Fixes #87, Fixes #86, Fixes #82
Description of changes:
NotAction
,NotResource
,NotCondition
keys as well as allow actions/resources elements to be specified as either as single string (or) array of strings. With these changes, the permission set object aligns completely to the schema spec supported by AWS SSO admin API for a permission set object, except forsessionDurationInMinutes
. The solution still expects this to be a string instead of a number as it's necessary for ISO 86401 conversions.60 minutes
when unset. So, the permission set with an empty session duration set by the user is created in SSO with a value of60 minutes
waiter
blocks within the pagination process when the solution discovers the target list of accounts that fall under aroot, ou_id, or account_tag
scope , to avoid getting throttled.SupportNestedOU
, the solution would automatically discover all the accounts under the specified ou_id , and traverses the complete tree including any child OU's. Similarly, when this behaviour is enabled, in case of an account moving from one OU to another OU, the solution would also traverse the tree upwards until root to discover the complete list of parent OU's and compute the list of account assignments that are to be added/removed.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.