aws-solutions / workload-discovery-on-aws

Workload Discovery on AWS is a solution to visualize AWS Cloud workloads. With it you can build, customize, and share architecture diagrams of your workloads based on live data from AWS. The solution maintains an inventory of the AWS resources across your accounts and regions, mapping their relationships and displaying them in the user interface.
https://aws.amazon.com/solutions/implementations/workload-discovery-on-aws/
Apache License 2.0
718 stars 85 forks source link

Discrepancies in UI for resources #508

Open psjd23 opened 7 months ago

psjd23 commented 7 months ago

If your issue relates to the Discovery Process, please first follow the steps described in the implementation guide Debugging the Discovery Component


Describe the bug After installing the solution in the management account (due to pre-existing technical debt and the absence of a delegated admin account), I've encountered a discrepancy in the visibility of EC2 instances. Despite meeting the prerequisites for AWS Config and setting it to AWS Organizations mode, the Resources table does not reflect all instances accurately.

To Reproduce The exact steps to reproduce this behavior are currently unknown.

Expected behavior All EC2 instances should be accurately displayed in the Resources table.

Screenshots image image image image

Browser (please complete the following information):

Additional context

This issue may not be limited to EC2 instances; however, they are the primary focus of my troubleshooting efforts. I'm using the aws-controltower-ConfigAggregatorForOrganizations aggregator, and AWS Config is enabled. I'm questioning whether opting out of the ConfigAggregatorName parameter and allowing the solution to provision its necessary components could resolve this issue.

Following the flowchart, there were no spikes above 75%. Opensearch average is 30%, and Neptune is 15%. Neptune had an initial spike to 70% during its first few minutes of monitored data. No OOM errors seen in ECS tasks. The target regions are us-east-1 and us-west-2, both of these regions account for 99% of our resources.

Parameters for CFN template:

AccountType MANAGEMENT  -
AdminUserEmailAddress   redacted    -
AlreadyHaveConfigSetup  Yes -
ApiAllowListedRanges    0.0.0.0/1,128.0.0.0/1   -
AthenaWorkgroup primary -
ConfigAggregatorName    aws-controltower-ConfigAggregatorForOrganizations   -
CpuUnits    1 vCPU  -
CreateNeptuneReplica    No  -
CreateOpensearchServiceRole Yes -
CrossAccountDiscovery   AWS_ORGANIZATIONS   -
DiscoveryTaskFrequency  15mins  -
MaxNCUs 3   -
Memory  2048    -
MinNCUs 1   -
NeptuneInstanceClass    db.t4g.medium   -
OpensearchInstanceType  m6g.large.search    -
OpensearchMultiAz   No  -
OrganizationUnitId  r-XXXX  -
PrivateSubnet0  subnet-redacted -
PrivateSubnet1  subnet-redacted -
VpcCidrBlock    10.111.0.0/16   -
VpcId   vpc-redacted    -

ECS logs from the latest run: log-events-viewer-result.1.csv

Please let me know any other data you need to help and I will get it ASAP. Thanks!

svozza commented 7 months ago

Thank you for such a detailed error report, this was very useful for me to rule out issues. You should be fine to use the Control Tower aggragator, I have done so myself with no issues.

As a sanity check, could you go to the advanced query in Config and run the query SELECT * WHERE resourceType = 'AWS::EC2::Instance' on the Control Tower aggragator and verify that the missing EC2 instances are in the aggregator. If they are, could you update one one the missing instances, e.g., add a tag, to trigger an update to the Config configuration item and then verify if the EC2 instance now appears the WD UI the next time the discovery process runs. Bear in mind, that the discovery process only runs every fifteen minutes os it might take a while for it to update.

psjd23 commented 7 months ago

Thanks for the quick response.

I ran the query and saw many results, so I did SELECT COUNT(*) WHERE resourceType = 'AWS::EC2::Instance' and the result was 266.

I made a change to an EC2 an hour ago (added a tag) and it did not update in the WD UI. I didn't see any errors in the ECS logs. Normal exits.

svozza commented 7 months ago

Hmm, this is very odd. If you're happy to do so, I would like to add some logging to the discovery process so I can try to get more information. My email address is the my GitHub user handle at amazon dot com and we can co-ordinate there as I will need to give you special build with the extra logging.

psjd23 commented 7 months ago

Thanks, email has been sent with title "GH Issue 508".