aws-solutions / workload-discovery-on-aws

Workload Discovery on AWS is a solution to visualize AWS Cloud workloads. With it you can build, customize, and share architecture diagrams of your workloads based on live data from AWS. The solution maintains an inventory of the AWS resources across your accounts and regions, mapping their relationships and displaying them in the user interface.
https://aws.amazon.com/solutions/implementations/workload-discovery-on-aws/
Apache License 2.0
722 stars 85 forks source link

Doesnt work out of the box with a new Control Tower AWS IAM account #506

Open tnh opened 8 months ago

tnh commented 8 months ago

Describe the bug

User: arn:aws:sts::381492128592:assumed-role/AWSReservedSSO_AWSAdministratorAccess_2f70d98355d8c189/tnh@trenthornibrookgmail.onmicrosoft.com is not authorized to perform: config:PutConfigurationRecorder with an explicit deny in a service control policy (Service: AmazonConfig; Status Code: 400; Error Code: AccessDeniedException; Request ID: 72a33ad1-1cc3-4f45-aacb-d9b278e28651; Proxy: null)

To Reproduce Steps to reproduce the behavior:

  1. new AWS account via cc
  2. set up Control Tower in master account with default settings (which sets up a logging account and OU hierarchy & default SCPs)
  3. vend a new child account
  4. log into new child account
  5. deploy the latest AWS Perspective workload discovery tool

*

tnh commented 8 months ago

first attempt - the lambda limits:

Screenshot 2024-02-07 at 8 58 11 am

this was fixed by raising a quota request - which just was annoying that I needed to dumpster dive to fix.

tnh commented 8 months ago

the second failure was due to the default SCPs post Control Tower deploy

Screenshot 2024-02-07 at 9 00 08 am
svozza commented 8 months ago

The config:PutConfigurationRecorder API is fundamental to how Workload Discovery works in SELF_MANAGED mode (every time you import an account through the UI that API is invoked). If you are using Control Tower, then I presume you are working in an AWS Organization so you should AWS_ORGANIIZATION mode. Just to note, as per the documentation, if deploying in AWS_ORGANIZATION mode, the solution must be deployed in a delegated admin account where StackSets and multi-Region AWS Config capabilities have been enabled.

TRANTANKHOA commented 4 months ago

I raised the lambda concurrency limit to move further and

  1. SearchResolversStack stuck at below until Stack creation time exceeded the specified timeout
OpenSearchSetup | - | Custom::OpenSearchSetup | CREATE_IN_PROGRESS
OpenSearchSetupFunction | workload-discovery-dev-Sea-OpenSearchSetupFunction-DdPOtw25siED | AWS::Lambda::Function | CREATE_COMPLETE
...

then failed

  1. workload-discovery-dev-Sea-OpenSearchSetupFunction-**** failed with

    START RequestId: 6b461cc0-0af9-4ebd-a905-35fbaf86bf9b Version: $LATEST
    --
    2024-06-04T14:05:09.088Z    6b461cc0-0af9-4ebd-a905-35fbaf86bf9b    INFO    {  RequestType: 'Delete',  ServiceToken: 'arn:aws:lambda:ap-southeast-2:992382856345:function:workload-discovery-dev-Sea-OpenSearchSetupFunction-DdPOtw25siED',  ResponseURL: 'https://cloudformation-custom-resource-response-apsoutheast2.s3-ap-southeast-2.amazonaws.com/arn%3Aaws%3Acloudformation%3Aap-southeast-2%3A992382856345%3Astack/workload-discovery-dev-SearchResolversStack-A4NNKBN2F1PY/05eb12d0-2273-11ef-ad12-0697873a487b%7COpenSearchSetup%7Cca6232e2-beb9-4e8d-be35-fda0e1a23b61?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240604T140507Z&X-Amz-SignedHeaders=host&X-Amz-Expires=7200&X-Amz-Credential=AKIA6MM33IIZ4UOX3QFV%2F20240604%2Fap-southeast-2%2Fs3%2Faws4_request&X-Amz-Signature=f5035f740de6927512e4ec301c00362dfa84159f96f03a9f8aab02802c7c5800',  StackId: 'arn:aws:cloudformation:ap-southeast-2:992382856345:stack/workload-discovery-dev-SearchResolversStack-A4NNKBN2F1PY/05eb12d0-2273-11ef-ad12-0697873a487b',  RequestId: 'ca6232e2-beb9-4e8d-be35-fda0e1a23b61',  LogicalResourceId: 'OpenSearchSetup',  PhysicalResourceId: 'workload-discovery-dev-SearchResolversStack-A4NNKBN2F1PY-OpenSearchSetup-1XOPXU678HIXT',  ResourceType: 'Custom::OpenSearchSetup',  ResourceProperties: {    ServiceToken: 'arn:aws:lambda:ap-southeast-2:992382856345:function:workload-discovery-dev-Sea-OpenSearchSetupFunction-DdPOtw25siED',    SolutionVersion: 'v2.1.7'  }}
    2024-06-04T14:05:09.128Z    6b461cc0-0af9-4ebd-a905-35fbaf86bf9b    INFO    Response body: {     "Status": "SUCCESS",     "Reason": "See the details in CloudWatch Log Stream: 2024/06/04/[$LATEST]56e597edf9dd41d8858d40727cb5bdeb",     "PhysicalResourceId": "2024/06/04/[$LATEST]56e597edf9dd41d8858d40727cb5bdeb",     "StackId": "arn:aws:cloudformation:ap-southeast-2:992382856345:stack/workload-discovery-dev-SearchResolversStack-A4NNKBN2F1PY/05eb12d0-2273-11ef-ad12-0697873a487b",     "RequestId": "ca6232e2-beb9-4e8d-be35-fda0e1a23b61",     "LogicalResourceId": "OpenSearchSetup",     "NoEcho": false }
    2024-06-04T14:05:26.484Z    6b461cc0-0af9-4ebd-a905-35fbaf86bf9b    INFO    send(..) failed executing https.request(..): AggregateError
    2024-06-04T14:05:26.485Z    6b461cc0-0af9-4ebd-a905-35fbaf86bf9b    ERROR   Invoke Error    {     "errorType": "AggregateError",     "errorMessage": "",     "code": "ETIMEDOUT",     "stack": [         "AggregateError [ETIMEDOUT]: ",         "    at internalConnectMultiple (node:net:1117:18)",         "    at afterConnectMultiple (node:net:1684:7)"     ] }
    END RequestId: 6b461cc0-0af9-4ebd-a905-35fbaf86bf9b
  2. The ecs task can't fetch ecr image

    ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.ap-southeast-2.amazonaws.com/": dial tcp 3.104.82.249:443: i/o timeout. Please check your task network configuration.
svozza commented 4 months ago

Are you deploying the solution to to an existing VPC?

TRANTANKHOA commented 4 months ago

Yes, with private subnets

svozza commented 4 months ago

The reason for the first timeout is that this custom resource runs in a VPC and in order for a custom resource to signal to CloudFormation that it has either succeeded or failed it must write to an S3 bucket: if there is no NAT gateway or S3 endpoint in the VPC then there is no way for this request to get to S3. There is documentation to verify if the VPC you are deploying to has the necessary configuration:

https://docs.aws.amazon.com/solutions/latest/workload-discovery-on-aws/prerequisites.html#verify-your-vpc-configuration

Without a NAT gateway will need VPC endpoints for every service listed in the documentation below in order for the Discovery process to work (this should also should fix your ECS issue):

https://docs.aws.amazon.com/solutions/latest/workload-discovery-on-aws/aws-apis.html