aws-solutions / workload-discovery-on-aws

Workload Discovery on AWS is a solution to visualize AWS Cloud workloads. With it you can build, customize, and share architecture diagrams of your workloads based on live data from AWS. The solution maintains an inventory of the AWS resources across your accounts and regions, mapping their relationships and displaying them in the user interface.
https://aws.amazon.com/solutions/implementations/workload-discovery-on-aws/
Apache License 2.0
727 stars 88 forks source link

Discovery not working with no logs #503

Closed WCottrell4 closed 9 months ago

WCottrell4 commented 9 months ago

If your issue relates to the Discovery Process, please first follow the steps described in the implementation guide Debugging the Discovery Component


Describe the bug A description of what the bug is. The Discovery process is not working properly. I have gone through the steps to import a new account, but nothing is coming through on Workload Discovery. I have taken a look at the debugging suggestions, but I can not find any logs. When looking at /esc/workload-discovery-cluster logs, there are no new logs, even though it seems as though the task is running every 15 minutes. To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A description of what you expected to happen. I would expect there to be logs and for the discovery process to be functioning properly. Screenshots If applicable, add screenshots to help explain your problem.

Browser (please complete the following information):

Edge lastest version

Additional context Add any other context about the problem here.

svozza commented 9 months ago

You should be able to get to the logs by finding the last stopped ECS task in the ECS console for the cluster named workload-discovery-cluster. If there are no stopped tasks then that means there is very likely some configuration issue (possibly an SCP) in your account that is preventing the task from launching.

Screenshot 2024-01-31 at 12 58 51
WCottrell4 commented 9 months ago

Thank you for the help. We found this yesterday and discovered the task was failing before it was completed, which is why there were no logs. It was failing because the task security group was too restrictive for it to complete.

WCottrell4 commented 9 months ago

We also had a 403 error when trying to view a diagram at first in the UI. When looking at the Web ACL, we noticed the logging was disabled and once we enabled it, the diagrams worked perfectly. Not sure why that would have caused a change, but that truly was the only change made for the diagrams to display as they should. The tool is working properly now, so I need no further help.

svozza commented 9 months ago

It might have been that the ApiAllowListedRanges parameter had been changed. That parameter is for WAF but it only refers to the AppSync API not the UI. We need to clarify that more in the docs because it's caused issues for others too.

WCottrell4 commented 9 months ago

Hello, I am running into an issue where I can not get diagrams to generate or expand randomly. I am getting a 403 error at random times where the tool just stops working for an extended period of time and then will randomly begin working again. Similar to the issue mentioned higher in this conversation chain, but it is not being resolved by any solutions that previously worked. Any idea why this would happen?

svozza commented 9 months ago

Where are the 403 errors appearing? In the WebUI? What is the ApiAllowListedRanges CFN parameter that I mentioned earlier set to currently?

WCottrell4 commented 9 months ago

The 403 is appearing in the UI and shown on the console as "Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'reduce')". There might also be another error in the console that is throwing, but the tool is working right now so I can't replicate the error at the moment

WCottrell4 commented 9 months ago

The ApiAllowListedRanges in the parameters is set to the default, 0.0.00/1,128.0.0.0/1

svozza commented 9 months ago

Very odd. I've not seen this before. The next time it happens, try to see the network request to the searchResources GraphQL query. Does it look like this or is there a field called errors there instead of resources?

Screenshot 2024-02-13 at 17 15 46
WCottrell4 commented 9 months ago

I have not recreated the error, but I was looking at that this morning when I was having the problem and it was showing 200 on all the calls the the GraphQL.