Expectations and recommendations for managing config drift in LZA guardrail SCPs?

roncanepa-tufts commented 9 months ago

Hello,

A few weeks ago, I received an email from AWS that the IamRolesStatement policy in our guardrail SCPs might need to be updated in order to address a potential for exploitation. The email included a sample/example configuration, and the more I compared that with our policy, the more differences I began to notice. I submitted an AWS Support ticket for clarification (and can provide that case number via private channels if that's helpful) and am still trying to work through this. We received our files directly from AWS Professional Services in an engagement with them to help us launch our landing zone right at the end of 2022/beginning of 2023. We launched on v1.2.x of the LZA.

Discovering these differences led to further research, which made me discover yet more drift even among AWS's own samples/examples, and this has made me wonder about expectations and how config drift is going to be handled in the longer term?

As an example of what I'm discussing, our IamRolesStatement policy looks like this:

{
"Sid": "IamRolesStatement",
"Effect": "Deny",
"Action": [
"iam:AttachRolePolicy",
"iam:CreateAccountAlias",
"iam:DeleteAccountAlias",
"iam:CreateUser",
"iam:DeleteUser",
"iam:CreateRole",
"iam:DeleteRole",
"iam:CreatePolicy",
"iam:DeletePolicy",
"iam:DeleteRolePermissionsBoundary",
"iam:DeleteRolePolicy",
"iam:DetachRolePolicy",
"iam:PutRolePermissionsBoundary",
"iam:PutRolePolicy",
"iam:UpdateAssumeRolePolicy",
"iam:UpdateRole",
"iam:UpdateRoleDescription"
],
"Resource": [
"arn:${PARTITION}:iam::*:role/${ACCELERATOR_PREFIX}-*"
],
"Condition": {
"ArnNotLike": {
"aws:PrincipalARN": [
"arn:${PARTITION}:iam::*:role/${ACCELERATOR_PREFIX}-*",
"arn:${PARTITION}:iam::*:role/AWSControlTowerExecution",
"arn:${PARTITION}:iam::*:role/cdk-accel-*"
]
}
}
},

however, the example in the email, and the sample config in the AWS repo look like this:

    {
      "Sid": "IamRolesStatement",
      "Effect": "Deny",
      "Action": ["iam:*"],
      "Resource": [
        "arn:${PARTITION}:iam::*:role/${ACCELERATOR_PREFIX}-*",
        "arn:${PARTITION}:iam::*:role/cdk-accel-*",
        "arn:${PARTITION}:iam::*:role/${MANAGEMENT_ACCOUNT_ACCESS_ROLE}"
      ],
      "Condition": {
        "ArnNotLike": {
          "aws:PrincipalARN": [
            "arn:${PARTITION}:iam::*:role/${ACCELERATOR_PREFIX}-*",
            "arn:${PARTITION}:iam::*:role/cdk-accel-*",
            "arn:${PARTITION}:iam::*:role/${MANAGEMENT_ACCOUNT_ACCESS_ROLE}",
            "arn:${PARTITION}:iam::*:role/AWSServiceRoleForConfig"
          ]
        }

Note the difference in both the Action and the aws:PrincipalARN of the Condition.

I also looked up the sample for education, where I noticed more differences, both between my file and the education example as well as between the education example and the "regular" LZA sample mentioned above, although ours does seem to more closely resemble the education version:

    {
      "Sid": "IamRolesStatement",
      "Effect": "Deny",
      "Action": [
        "iam:AttachRolePolicy",
        "iam:CreateAccountAlias",
        "iam:DeleteAccountAlias",
        "iam:CreateUser",
        "iam:DeleteUser",
        "iam:CreateRole",
        "iam:DeleteRole",
        "iam:CreatePolicy",
        "iam:DeletePolicy",
        "iam:DeleteRolePermissionsBoundary",
        "iam:DeleteRolePolicy",
        "iam:DetachRolePolicy",
        "iam:PutRolePermissionsBoundary",
        "iam:PutRolePolicy",
        "iam:UpdateAssumeRolePolicy",
        "iam:UpdateRole",
        "iam:UpdateRoleDescription"
      ],
      "Resource": [
        "arn:${PARTITION}:iam::*:role/${ACCELERATOR_PREFIX}-*"
      ],
      "Condition": {
        "ArnNotLike": {
          "aws:PrincipalARN": [
            "arn:${PARTITION}:iam::*:role/${ACCELERATOR_PREFIX}-*",
            "arn:${PARTITION}:iam::*:role/${MANAGEMENT_ACCOUNT_ACCESS_ROLE}",
            "arn:${PARTITION}:iam::*:role/cdk-accel-*"
          ]
        }
      }

In our config, we do have the following declared in global-config.yaml: managementAccountAccessRole: AWSControlTowerExecution

which seems to be to be intended for use as a variable substitution but our policy doesn't appear to be using it. Here's a table showing the various permutations of aws:PrincipalARN between the three versions of the policy (where a - means it's missing):

	lza-sample-config	mine	aws-for-education
${ACCELERATOR_PREFIX}-*	Y	Y	Y
AWSControlTowerExecution	-	Y	-
cdk-accel-*	Y	Y	Y
${MANAGEMENT_ACCOUNT_ACCESS_ROLE}	Y	-	Y
AWSServiceRoleForConfig	Y	-	-

So there's currently a 3-way drift between the regular lza config example, our copy of the policy, and the config example for education.

It feels relevant to highlight this and to ask the question, because in my view, these policies are purely to protect LZA-created resources and are thus ultimately "internal implementation details" of the LZA itself, versus policies we might write to support our business use cases. Notably, updating the LZA itself won't update these policies because these policies live in the "config" repo and not the LZA repo. I know that the release notes for v1.5 of the LZA mentioned these changes and that we should ["review] configuration changes made to the lza-sample-config and determine which changes you need to apply to your configuration", but given that these are OOTB policies full of internal details, how am I to know what changes can/should be made and the potential impacts of doing so? I have workload accounts in production usage and so I do need to tread carefully here. Am hoping to get some feedback from the LZA team on expectations and future vision in this regard.

Thank you!

erwaxler commented 9 months ago

Thank you for providing your investigation into the variance between the guardrail service control policy (SCP) that is currently in your Landing Zone Accelerator on AWS (LZA) configuration and the provided sample configurations made available within this GitHub repository.

I would first like to address the actions requested as part of the email that you received earlier this month. The portion of the guardrails-2.json SCP that was provided as part of our lza-sample-config did not previously include the clause for protecting the "arn:${PARTITION}:iam:::role/cdk-accel-" Resource in the policy statement. Reviewing the policy statement that you provided in this issue, it does have the cdk-accel-* resource clause added and no further action would be required. Similarly the ${MANAGEMENT_ACCOUNT_ACCESS_ROLE} is replaced with AWSControlTowerExecution upon deployment in environments with Control Tower enabled as described here, so no action is required.

Furthermore, we are in the process of moving sample configurations to their own dedicated repositories which will allow you to follow and track changes to only the specific configurations that you care about.

roncanepa-tufts commented 9 months ago

Hello erwaxler, thank you for taking the time to reply.

The info about the cdk-accel-* being missing is quite helpful, as the email communication didn't state what the actual issue was.
I discovered the landing-zone-accelerator-on-aws-for-education repo previously, but it also hasn't been updated in a while, so it was hard to determine the relevance, especially given the other differences in our policy that I was seeing.

I think that's a big remaining question I have: why are the guardrail policies so different between the two examples?

erwaxler commented 8 months ago

Thank you for bringing this to our attention, I'll make sure to share this feedback with our team so we can be clearer in our future communications. Over time we have had many sample LZA configurations created to address various industry and compliance standards. While these different sample configurations had historically been included in the landing-zone-accelerator repository, we have started the process of moving sample configurations to their own independent repositories. This ensures the sample configurations can be appropriately updated by their respective owners. We see two benefits of this workflow:

Industry leaders can better assess and guide the requirements of customers in their market segment
Industry sample configurations can be updated at a cadence independent of changes to the LZA source code

We will continue to iterate on this process as we evolve as a product, thank you for providing this feedback so we can continue to improve on the LZA customer experience.

awslabs / landing-zone-accelerator-on-aws

Expectations and recommendations for managing config drift in LZA guardrail SCPs? #382