aws-solutions / aws-control-tower-customizations

The Customizations for AWS Control Tower solution combines AWS Control Tower and other highly-available, trusted AWS services to help customers more quickly set up a secure, multi-account AWS environment using AWS best practices.
https://docs.aws.amazon.com/controltower/latest/userguide/cfct-overview.html
Apache License 2.0
354 stars 205 forks source link

Slowness in deploying stacksets for organization with 800+ accounts #174

Closed bdesika-aws closed 3 months ago

bdesika-aws commented 8 months ago

Describe the bug Customer reported slowness in CfCT pipeline irrespective of vending a new account or updating existing stacksets.

To Reproduce Create an organization with 800+ accounts with 8+ regions. Create a manifest with any sample template and point the deployment_targets to all the OUs of those accounts. Now, perform a code release to the pipeline without any changes in template. It would take approximately 30 mins per stackset even if there is no change to templates or changesets.

With this situation, the customer has 50+ customizations and takes almost 8-14 hrs for the pipeline to complete.

Expected behavior Pipeline should complete in less than 5 mins per stackset if there is no changes to the customization templates.

Please complete the following information about the solution:

To get the version of the solution, you can look at the description of the created CloudFormation stack. For example, "(SO0089) - customizations-for-aws-control-tower Solution. Version: v1.0.0". You can also find the version from releases

Screenshots

Screenshot 2023-10-27 at 2 29 03 PM

Additional context I am from ProServe and investigated the issue. Upon digging into the code, it turns out the function "list_stack_instances_account_ids" has MaxResults set to just 20 and for such a large environment with 7600+ stack instances it takes 378 iterations to build the stack instances and on top the stepfunction has wait cycle of 5 seconds between each function call. It causes 30+ mins just to build the list.

Solution: If we change the MaxResults to 100 and reduce the wait time to 1 second, the statemachine completes the task in just about 2.51 mins. This improves the performance significantly. This has been tested in the environment and working fine. I also do not see API throttling issue by making this change. I request the team to review and add this fix to the future release so the other customers would benefit. thanks!

hanafya commented 8 months ago

@bdesika-aws I have cut a backlog item to review this request.

stumins commented 7 months ago

Hi @bdesika-aws,

CFCT v2.7.0 optimizes how these account IDs are fetched. Please upgrade to v2.7.0 and let us know if you do not see an improvement in performance.