aws / copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
https://aws.github.io/copilot-cli/
Apache License 2.0
3.53k stars 417 forks source link

[Bug]: Tracing Overloads Cluster CPU & Memory #5900

Open Jordan-Eckowitz opened 3 months ago

Jordan-Eckowitz commented 3 months ago

Description:

When I turned on observability (x-ray tracing) the number of tasks in my cluster scaled up and maxed out (I set a maximum of 10) with CPU at 100% and memory in the 80% range. This was in an instance which had no active users at the time. When I disabled tracing, it dropped back down to 1 task, ~2% CPU and 15% memory. No code was changed so we're able to isolate that its an issue with the xray tracing daemon.

It was enabled with the following in my manifest file:

observability:
  tracing: awsxray

Details:

Observed result:

At the time, as mentioned above, there was no traffic. The logs showed activity from the tracing daemon which made us suspect that it was the root cause. Removing the service immediately corrected the issue. Unfortunately after removing the service we lost the tracing logs.

Lou1415926 commented 3 months ago

Hi there! For backend services, we use aws-otel-collector image as a sidecar to your service. It seems from your experiment that indeed this sidecar is the most likely culprit of the usage that you saw. However, Copilot is only a consumer of the image - would you be able to raise this issue on their GitHub repository so that the team is aware of this issue?

We could potentially offer the option to use a different image for observability, but I think that would make this a feature request instead of a bug 🤔. I will remove the [type/bug] label for now, as it looks like this is behavior is not caused by Copilot logic. Please let me know if you disagree!