aws-samples / landing-zone-accelerator-on-aws-for-cccs-medium

MIT No Attribution
13 stars 7 forks source link

Unable to deploy LZA - too many SCP Limit #10

Open JimToupet opened 18 hours ago

JimToupet commented 18 hours ago

We try to deploy the 1.9.2 version to a totally new account. After enabling Control Tower (LZ version 3.3), we saw that 2 sets of SCP are created (named aws-guardrails-xxxx) and attached to each OU. On a previously deployment, of the same version (LZ 3.3 and LZA 1.9.2), a couple weeks ago, there's only 1 aws-guardrails-xxxx SCP created and attached to each OU. Looking deeper it seems that those 2 SCP are a separation of the one previously deployed.

As you can imagine, this situation leads to a too many SCP Limits when trying to apply the CCCS Medium LZA configuration.

Does anyone is aware of this situation ? What are possible steps to solve this issue ?

oliviergaumond commented 18 hours ago

Just to confirm, you didn't enable Control Tower Region Deny controls or other controls that would have cause Control Tower to create more SCP files?

This is the most common cases where I have seen multiples SCP created by Control Tower. If that is not the case I recommend you open a support ticket with Control Tower so they help you identify what have caused the creation of multiple SCP.

khris-zeroeyes commented 18 hours ago

You are not alone. I recently had a similar experience when attempting to pivot from the standard configuration to the TSE-SE configuration (whose config is identical to CCCS Medium).

At first I attempted to just clean up the management account, but since that left accounts in the organization, LZA was unable to manage Control Tower on my behalf. So I setup Control Tower, which put in place mandatory controls, which conflicted with those configured by LZA with TSE-SE. Any attempt to disable those mandatory controls was in vein, since LZA would reconcile Control Tower early in the pipeline, putting the conflicting controls back in place.

At this point I decided to do some further cleanup, so I made the "mistake" of closing the accounts before removing them from the organization (why would I want to keep them open if I was trying to start anew). Well that turned out to be a major time sink, which cost me a week of correspondences with AWS support to reopen the accounts so they can be standalone, even though I had no intention of keeping them open.

After a week, I finally had a clean slate (or so I thought). Uninstalling the solution doesn't remove all resources including IAM roles and KMS aliases. However, cleaning up those resources was also in vein, since at this point the pipeline had already created accounts in the organization. In order for LZA to manage Control Tower on my behalf, I would need to remove the accounts from the organization. But guess what, there's a 7 day moratorium on removing accounts from the organization that weren't invited.

Not having a week to spare, I created a new management account, but that one got similarly wedged when the organization incorrectly reported that the account quota was exceeded (at this point there were 2 accounts in the organization, when the default quota is 10). So I had to abandon that account as well.

Now into my 3rd management account and I'm troubleshooting issues with this clause in the SCP preventing CloudFormation stack management in the LogArchive account. Also be forewarned that the Accounts stage in the Pipeline CodePipeline pipeline can require some babysitting on account of timeouts that occur repeatedly.

JimToupet commented 17 hours ago

@oliviergaumond yes we saw that we have enabled the Deny Regions controls. We update Control Tower settings and disabling this control. We still have 2 SCP per OU and one of the 2 had the Deny Regions controls removed.

oliviergaumond commented 17 hours ago

Thanks for confirming. I have seen this behavior before. Best path is to check with AWS Support if they can help you with that.

JimToupet commented 17 hours ago

Merci!

JimToupet commented 16 hours ago

@khris-zeroeyes

Now into my 3rd management account and I'm troubleshooting issues with this clause in the SCP preventing CloudFormation stack management in the LogArchive account. Also be forewarned that the Accounts stage in the Pipeline CodePipeline pipeline can require some babysitting on account of timeouts that occur repeatedly.

Looks familiar to what we already get.

In your global-config.yaml file, check this setting and be sure it's set to AWSControlTowerExecution if you use CT.

managementAccountAccessRole: AWSControlTowerExecution # UPDATE: If using Control Tower, set to AWSControlTowerExecution

khris-zeroeyes commented 16 hours ago

@JimToupet I appreciate the tip, but I do have that set already. I will revisit each of the settings here to make sure I haven't missed anything.

managementAccountAccessRole: AWSControlTowerExecution
khris-zeroeyes commented 9 hours ago

As it turns out a number of the SCP clauses were missing an exemption for the "arn:${PARTITION}:iam::*:role/AcceleratorPipelineDeploymentRole" ARN, but this was due to my having overridden cdkOptions.customDeploymentRole in global-config.yaml while I experimented with using an external pipeline deployment instead of the management account. PEBCAK

Also be forewarned that the Accounts stage in the Pipeline CodePipeline pipeline can require some babysitting on account of timeouts that occur repeatedly.

This is probably more PEBCAK because until today I hadn't requested for the Lambda concurrent executions quota to be bumped, as documented here