awslabs / landing-zone-accelerator-on-aws

Deploy a multi-account cloud foundation to support highly-regulated workloads and complex compliance requirements.
https://aws.amazon.com/solutions/implementations/landing-zone-accelerator-on-aws/
Apache License 2.0
563 stars 448 forks source link

`customizations-config.yaml` does not appear to properly implement `runOrder` for `cloudFormationStackSets` #63

Open atfurman opened 1 year ago

atfurman commented 1 year ago

Describe the bug As a result of https://github.com/awslabs/landing-zone-accelerator-on-aws/issues/51 we have refactored some of our stackSets out into multiple templates. As there are now run order dependencies for these templates we attempted to enforce run order in the following way:

# Note that template files must be updated for each deployment, as there is currently no support in LZA for passing parameters
homeRegion: &HOME_REGION us-east-1
customizations:
  cloudFormationStackSets:
    - capabilities: [CAPABILITY_NAMED_IAM]
      deploymentTargets:
        organizationalUnits:
          - Root
      description: ATOM ThreatAlert Scan Role
      name: ATOM-ThreatAlert-Scan-Role
      regions: 
        - *HOME_REGION
      runOrder: 1
      template: cloudformation/atom-threatalert-scan-role.yaml
      terminationProtection: true
      # Customizations are limited to a max length of 51200 bytes, while the atom-aws-alerts clocks in at 94532 bytes.
      # ATOM AWS Alerts are chunked into three separate templates below
      # This limitation will be lifted in the future, and is being tracked in https://github.com/awslabs/landing-zone-accelerator-on-aws/issues/51
    - capabilities: [CAPABILITY_NAMED_IAM]
      deploymentTargets:
        organizationalUnits:
          - Root
      description: ATOM CloudWatch Alerting infrastructure and CIS alerts
      name: ATOM-AWS-Alerts-Infrastructure
      regions: 
        - *HOME_REGION
      runOrder: 1
      template: cloudformation/atom-aws-alerts-infrastructure.yaml
      terminationProtection: true
    - capabilities: [CAPABILITY_NAMED_IAM]
      deploymentTargets:
        organizationalUnits:
          - Root
      description: ATOM Splunk Cross Account Role
      name: ATOM-Splunk-Cross-Account-Role
      regions: 
        - *HOME_REGION
      runOrder: 1
      template: cloudformation/atom-splunk-cross-account-role.yml
    - capabilities: []
      deploymentTargets:
        organizationalUnits:
          - Root
      description: ATOM AWS Alerts
      name: ATOM-AWS-Alerts-1
      regions: 
        - *HOME_REGION
      runOrder: 2
      template: cloudformation/atom-aws-alerts-1.yaml
      terminationProtection: true
    - capabilities: []
      deploymentTargets:
        organizationalUnits:
          - Root
      description: ATOM AWS Alerts
      name: ATOM-AWS-Alerts-2
      regions: 
        - *HOME_REGION
      runOrder: 2
      template: cloudformation/atom-aws-alerts-2.yaml
      terminationProtection: true
      # FIPS requirements are not universally applicable. If FIPS is not applicable, comment out this stackset
    - capabilities: [CAPABILITY_NAMED_IAM]
      deploymentTargets:
        organizationalUnits:
          - Infrastructure
      description: SSM association for EC2 instances running Windows or Linux. Association checks FIPS status and publishes alerts if instances are not running in FIPS mode
      name: ATOM-SSM-CloudWatch-FIPS-Validator
      regions: 
        - *HOME_REGION
      runOrder: 2
      template: cloudformation/atom-ssm-cw-fips.yaml
      terminationProtection: true

However, when deploying this configuration, the following error was encountered:

❌  AWSAccelerator-CustomizationsStack-838035265473-us-east-1 failed: Error: The stack named AWSAccelerator-CustomizationsStack-838035265473-us-east-1 failed to deploy: UPDATE_ROLLBACK_COMPLETE (Update successful. One or more resources could not be deleted.): Resource handler returned message: "Resource of type 'Stack set operation [cd6f0cf4-6783-44c3-8352-84b20f5ab2f2] was unexpectedly stopped or failed. status reason(s): [Unable to fetch parameters [/atom/alerts/snsPrimaryTopic] from parameter store for this account.]' with identifier 'ATOM-SSM-CloudWatch-FIPS-Validator:ea9d5edd-4eff-420d-8963-7ee83e2a419f' did not stabilize." (RequestToken: f4a69153-9c73-0f3f-abaf-83f3c194293c, HandlerErrorCode: NotStabilized), Resource handler returned message: "Operation 9b7fb3c7-a217-4cdb-a3ac-2ac2df66b143 on StackSet arn:aws:cloudformation:us-east-1:838035265473:stackset/ATOM-ThreatAlert-Scan-Role:8583acf8-5221-40e4-aa0a-30dae216e351 is in progress (Service: CloudFormation, Status Code: 409, Request ID: f1e164c3-e9f4-4026-a1ca-bc9ccd55ced9)" (RequestToken: 61969314-d635-367c-b31b-09bdee53b5df, HandlerErrorCode: GeneralServiceException), Resource handler returned message: "Operation 561495e8-f5a0-4ae4-8ce5-dc4bc589c70e on StackSet arn:aws:cloudformation:us-east-1:838035265473:stackset/ATOM-Splunk-Cross-Account-Role:eef0d709-8ca9-4be2-8935-c4988aa3994d is in progress (Service: CloudFormation, Status Code: 409, Request ID: 8ca2e553-0189-4fbb-afcb-181443820e25)" (RequestToken: f816fa85-60b1-2f34-2fe4-3bd6a4e7f2d8, HandlerErrorCode: GeneralServiceException), Resource handler returned message: "Operation 70dc8db2-0ebf-4b46-928a-ada733705e28 on StackSet arn:aws:cloudformation:us-east-1:838035265473:stackset/ATOM-AWS-Alerts-Infrastructure:511f6cc5-2eac-4d23-b6f4-acb82791e3c6 is in progress (Service: CloudFormation, Status Code: 409, Request ID: e0d264fa-8e38-4b0d-8f93-1478698dcc76)" (RequestToken: 90183a28-859a-510c-8567-6325a6899a60, HandlerErrorCode: GeneralServiceException), Resource handler returned message: "Operation 70dc8db2-0ebf-4b46-928a-ada733705e28 on StackSet arn:aws:cloudformation:us-east-1:838035265473:stackset/ATOM-AWS-Alerts-Infrastructure:511f6cc5-2eac-4d23-b6f4-acb82791e3c6 is in progress (Service: CloudFormation, Status Code: 409, Request ID: 39310035-ad9b-42b1-a851-7b4345099d92)" (RequestToken: f2d8f689-726d-7bf7-680d-c6855a964c3c, HandlerErrorCode: GeneralServiceException)

This error should not occur if runOrder is working as expected, since cloudformation/atom-aws-alerts-infrastructure.yaml sets this parameter and has a runOrder of 1, while cloudformation/atom-ssm-cw-fips.yaml has a runOrder of 2.

Interrogating AWSAccelerator-CustomizationsStack- in the CloudFormation console indicates that all stackSet creations were initiated at the same time:

image

Expected behavior I would expect for the Accelerator to respect runOrder arguments when deploying stackSets.

Please complete the following information about the solution:

erwaxler commented 1 year ago

Hi @atfurman , thank you for creating this issue! The runOrder property does not currently exist on the CloudFormationStackSetConfig object, only on CloudFormationStackConfig.

This is due to the fact that while we can reliably enforce the creation order of each StackSet resource, we cannot guarantee the ordering of the stack instances created by the StackSets.

Are you able to perform this deployment using cloudFormationStacks rather than cloudFormationStackSets? This would allow you to use the runOrder property, and may eliminate the need altogether as you will not need to wait for v.1.3.1 to deploy templates larger than 51200 bytes.

atfurman commented 1 year ago

Hi @erwaxler after discussion with @rgd11 we have pursued a solution outside of LZA for the interim. As LZA does not fully manage cloudformation stacks (does not support deleting stacks, for instance) we did not want to get into a situation where we have to manually clean up a large number of stacks created by LZA across the organization. We will wait until larger templates are supported before trying to use LZA to accomplish this.

erwaxler commented 1 year ago

@atfurman Thank you for following up. Regarding the orphaned stacks, keep an eye on this CDK issue as well. If this becomes supported through CDK that will immediately solve the issue. I will go ahead and close this ticket and I will keep you updated through #51 regarding the size limits.

erwaxler commented 1 month ago

@atfurman I'm pleased to say that we can now support this feature thanks to the work in this pull request: https://github.com/awslabs/landing-zone-accelerator-on-aws/pull/575

I'll be reopening this issue for visibility, I will close the issue once v1.10.0 which includes this functionality is available publicly.

Thanks again for your support of the Landing Zone Accelerator!

mbevc1 commented 1 month ago

Awesome, thanks!