Doesnt work out of the box with a new Control Tower AWS IAM account

tnh commented 8 months ago

Describe the bug

new AWS account
new set up of AWS Control Tower
Deploy the stack into a child account
stack fails due to two issues:
1. default limit on lambda concurrency
2. default SCPs on AWS config - preventing the config stack from being deployed:

User: arn:aws:sts::381492128592:assumed-role/AWSReservedSSO_AWSAdministratorAccess_2f70d98355d8c189/tnh@trenthornibrookgmail.onmicrosoft.com is not authorized to perform: config:PutConfigurationRecorder with an explicit deny in a service control policy (Service: AmazonConfig; Status Code: 400; Error Code: AccessDeniedException; Request ID: 72a33ad1-1cc3-4f45-aacb-d9b278e28651; Proxy: null)

To Reproduce Steps to reproduce the behavior:

new AWS account via cc
set up Control Tower in master account with default settings (which sets up a logging account and OU hierarchy & default SCPs)
vend a new child account
log into new child account
deploy the latest AWS Perspective workload discovery tool

*

tnh commented 8 months ago

first attempt - the lambda limits:

this was fixed by raising a quota request - which just was annoying that I needed to dumpster dive to fix.

tnh commented 8 months ago

the second failure was due to the default SCPs post Control Tower deploy

svozza commented 8 months ago

The config:PutConfigurationRecorder API is fundamental to how Workload Discovery works in SELF_MANAGED mode (every time you import an account through the UI that API is invoked). If you are using Control Tower, then I presume you are working in an AWS Organization so you should AWS_ORGANIIZATION mode. Just to note, as per the documentation, if deploying in AWS_ORGANIZATION mode, the solution must be deployed in a delegated admin account where StackSets and multi-Region AWS Config capabilities have been enabled.

TRANTANKHOA commented 4 months ago

I raised the lambda concurrency limit to move further and

SearchResolversStack stuck at below until Stack creation time exceeded the specified timeout

OpenSearchSetup | - | Custom::OpenSearchSetup | CREATE_IN_PROGRESS
OpenSearchSetupFunction | workload-discovery-dev-Sea-OpenSearchSetupFunction-DdPOtw25siED | AWS::Lambda::Function | CREATE_COMPLETE
...

then failed

workload-discovery-dev-Sea-OpenSearchSetupFunction-**** failed with

START RequestId: 6b461cc0-0af9-4ebd-a905-35fbaf86bf9b Version: $LATEST
--
2024-06-04T14:05:09.088Z    6b461cc0-0af9-4ebd-a905-35fbaf86bf9b    INFO    {  RequestType: 'Delete',  ServiceToken: 'arn:aws:lambda:ap-southeast-2:992382856345:function:workload-discovery-dev-Sea-OpenSearchSetupFunction-DdPOtw25siED',  ResponseURL: 'https://cloudformation-custom-resource-response-apsoutheast2.s3-ap-southeast-2.amazonaws.com/arn%3Aaws%3Acloudformation%3Aap-southeast-2%3A992382856345%3Astack/workload-discovery-dev-SearchResolversStack-A4NNKBN2F1PY/05eb12d0-2273-11ef-ad12-0697873a487b%7COpenSearchSetup%7Cca6232e2-beb9-4e8d-be35-fda0e1a23b61?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240604T140507Z&X-Amz-SignedHeaders=host&X-Amz-Expires=7200&X-Amz-Credential=AKIA6MM33IIZ4UOX3QFV%2F20240604%2Fap-southeast-2%2Fs3%2Faws4_request&X-Amz-Signature=f5035f740de6927512e4ec301c00362dfa84159f96f03a9f8aab02802c7c5800',  StackId: 'arn:aws:cloudformation:ap-southeast-2:992382856345:stack/workload-discovery-dev-SearchResolversStack-A4NNKBN2F1PY/05eb12d0-2273-11ef-ad12-0697873a487b',  RequestId: 'ca6232e2-beb9-4e8d-be35-fda0e1a23b61',  LogicalResourceId: 'OpenSearchSetup',  PhysicalResourceId: 'workload-discovery-dev-SearchResolversStack-A4NNKBN2F1PY-OpenSearchSetup-1XOPXU678HIXT',  ResourceType: 'Custom::OpenSearchSetup',  ResourceProperties: {    ServiceToken: 'arn:aws:lambda:ap-southeast-2:992382856345:function:workload-discovery-dev-Sea-OpenSearchSetupFunction-DdPOtw25siED',    SolutionVersion: 'v2.1.7'  }}
2024-06-04T14:05:09.128Z    6b461cc0-0af9-4ebd-a905-35fbaf86bf9b    INFO    Response body: {     "Status": "SUCCESS",     "Reason": "See the details in CloudWatch Log Stream: 2024/06/04/[$LATEST]56e597edf9dd41d8858d40727cb5bdeb",     "PhysicalResourceId": "2024/06/04/[$LATEST]56e597edf9dd41d8858d40727cb5bdeb",     "StackId": "arn:aws:cloudformation:ap-southeast-2:992382856345:stack/workload-discovery-dev-SearchResolversStack-A4NNKBN2F1PY/05eb12d0-2273-11ef-ad12-0697873a487b",     "RequestId": "ca6232e2-beb9-4e8d-be35-fda0e1a23b61",     "LogicalResourceId": "OpenSearchSetup",     "NoEcho": false }
2024-06-04T14:05:26.484Z    6b461cc0-0af9-4ebd-a905-35fbaf86bf9b    INFO    send(..) failed executing https.request(..): AggregateError
2024-06-04T14:05:26.485Z    6b461cc0-0af9-4ebd-a905-35fbaf86bf9b    ERROR   Invoke Error    {     "errorType": "AggregateError",     "errorMessage": "",     "code": "ETIMEDOUT",     "stack": [         "AggregateError [ETIMEDOUT]: ",         "    at internalConnectMultiple (node:net:1117:18)",         "    at afterConnectMultiple (node:net:1684:7)"     ] }
END RequestId: 6b461cc0-0af9-4ebd-a905-35fbaf86bf9b

The ecs task can't fetch ecr image

ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.ap-southeast-2.amazonaws.com/": dial tcp 3.104.82.249:443: i/o timeout. Please check your task network configuration.

svozza commented 4 months ago

Are you deploying the solution to to an existing VPC?

TRANTANKHOA commented 4 months ago

Yes, with private subnets

svozza commented 4 months ago

The reason for the first timeout is that this custom resource runs in a VPC and in order for a custom resource to signal to CloudFormation that it has either succeeded or failed it must write to an S3 bucket: if there is no NAT gateway or S3 endpoint in the VPC then there is no way for this request to get to S3. There is documentation to verify if the VPC you are deploying to has the necessary configuration:

https://docs.aws.amazon.com/solutions/latest/workload-discovery-on-aws/prerequisites.html#verify-your-vpc-configuration

Without a NAT gateway will need VPC endpoints for every service listed in the documentation below in order for the Discovery process to work (this should also should fix your ECS issue):

https://docs.aws.amazon.com/solutions/latest/workload-discovery-on-aws/aws-apis.html

aws-solutions / workload-discovery-on-aws

Doesnt work out of the box with a new Control Tower AWS IAM account #506