Open tnh opened 8 months ago
first attempt - the lambda limits:
this was fixed by raising a quota request - which just was annoying that I needed to dumpster dive to fix.
the second failure was due to the default SCPs post Control Tower deploy
The config:PutConfigurationRecorder
API is fundamental to how Workload Discovery works in SELF_MANAGED
mode (every time you import an account through the UI that API is invoked). If you are using Control Tower, then I presume you are working in an AWS Organization so you should AWS_ORGANIIZATION
mode. Just to note, as per the documentation, if deploying in AWS_ORGANIZATION
mode, the solution must be deployed in a delegated admin account where StackSets and multi-Region AWS Config capabilities have been enabled.
I raised the lambda concurrency limit to move further and
Stack creation time exceeded the specified timeout
OpenSearchSetup | - | Custom::OpenSearchSetup | CREATE_IN_PROGRESS
OpenSearchSetupFunction | workload-discovery-dev-Sea-OpenSearchSetupFunction-DdPOtw25siED | AWS::Lambda::Function | CREATE_COMPLETE
...
then failed
workload-discovery-dev-Sea-OpenSearchSetupFunction-**** failed with
START RequestId: 6b461cc0-0af9-4ebd-a905-35fbaf86bf9b Version: $LATEST
--
2024-06-04T14:05:09.088Z 6b461cc0-0af9-4ebd-a905-35fbaf86bf9b INFO { RequestType: 'Delete', ServiceToken: 'arn:aws:lambda:ap-southeast-2:992382856345:function:workload-discovery-dev-Sea-OpenSearchSetupFunction-DdPOtw25siED', ResponseURL: 'https://cloudformation-custom-resource-response-apsoutheast2.s3-ap-southeast-2.amazonaws.com/arn%3Aaws%3Acloudformation%3Aap-southeast-2%3A992382856345%3Astack/workload-discovery-dev-SearchResolversStack-A4NNKBN2F1PY/05eb12d0-2273-11ef-ad12-0697873a487b%7COpenSearchSetup%7Cca6232e2-beb9-4e8d-be35-fda0e1a23b61?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240604T140507Z&X-Amz-SignedHeaders=host&X-Amz-Expires=7200&X-Amz-Credential=AKIA6MM33IIZ4UOX3QFV%2F20240604%2Fap-southeast-2%2Fs3%2Faws4_request&X-Amz-Signature=f5035f740de6927512e4ec301c00362dfa84159f96f03a9f8aab02802c7c5800', StackId: 'arn:aws:cloudformation:ap-southeast-2:992382856345:stack/workload-discovery-dev-SearchResolversStack-A4NNKBN2F1PY/05eb12d0-2273-11ef-ad12-0697873a487b', RequestId: 'ca6232e2-beb9-4e8d-be35-fda0e1a23b61', LogicalResourceId: 'OpenSearchSetup', PhysicalResourceId: 'workload-discovery-dev-SearchResolversStack-A4NNKBN2F1PY-OpenSearchSetup-1XOPXU678HIXT', ResourceType: 'Custom::OpenSearchSetup', ResourceProperties: { ServiceToken: 'arn:aws:lambda:ap-southeast-2:992382856345:function:workload-discovery-dev-Sea-OpenSearchSetupFunction-DdPOtw25siED', SolutionVersion: 'v2.1.7' }}
2024-06-04T14:05:09.128Z 6b461cc0-0af9-4ebd-a905-35fbaf86bf9b INFO Response body: { "Status": "SUCCESS", "Reason": "See the details in CloudWatch Log Stream: 2024/06/04/[$LATEST]56e597edf9dd41d8858d40727cb5bdeb", "PhysicalResourceId": "2024/06/04/[$LATEST]56e597edf9dd41d8858d40727cb5bdeb", "StackId": "arn:aws:cloudformation:ap-southeast-2:992382856345:stack/workload-discovery-dev-SearchResolversStack-A4NNKBN2F1PY/05eb12d0-2273-11ef-ad12-0697873a487b", "RequestId": "ca6232e2-beb9-4e8d-be35-fda0e1a23b61", "LogicalResourceId": "OpenSearchSetup", "NoEcho": false }
2024-06-04T14:05:26.484Z 6b461cc0-0af9-4ebd-a905-35fbaf86bf9b INFO send(..) failed executing https.request(..): AggregateError
2024-06-04T14:05:26.485Z 6b461cc0-0af9-4ebd-a905-35fbaf86bf9b ERROR Invoke Error { "errorType": "AggregateError", "errorMessage": "", "code": "ETIMEDOUT", "stack": [ "AggregateError [ETIMEDOUT]: ", " at internalConnectMultiple (node:net:1117:18)", " at afterConnectMultiple (node:net:1684:7)" ] }
END RequestId: 6b461cc0-0af9-4ebd-a905-35fbaf86bf9b
The ecs task can't fetch ecr image
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.ap-southeast-2.amazonaws.com/": dial tcp 3.104.82.249:443: i/o timeout. Please check your task network configuration.
Are you deploying the solution to to an existing VPC?
Yes, with private subnets
The reason for the first timeout is that this custom resource runs in a VPC and in order for a custom resource to signal to CloudFormation that it has either succeeded or failed it must write to an S3 bucket: if there is no NAT gateway or S3 endpoint in the VPC then there is no way for this request to get to S3. There is documentation to verify if the VPC you are deploying to has the necessary configuration:
Without a NAT gateway will need VPC endpoints for every service listed in the documentation below in order for the Discovery process to work (this should also should fix your ECS issue):
https://docs.aws.amazon.com/solutions/latest/workload-discovery-on-aws/aws-apis.html
Describe the bug
To Reproduce Steps to reproduce the behavior:
*