Open Jordan-Eckowitz opened 3 months ago
Hi there! For backend services, we use aws-otel-collector image as a sidecar to your service. It seems from your experiment that indeed this sidecar is the most likely culprit of the usage that you saw. However, Copilot is only a consumer of the image - would you be able to raise this issue on their GitHub repository so that the team is aware of this issue?
We could potentially offer the option to use a different image for observability, but I think that would make this a feature request instead of a bug 🤔. I will remove the [type/bug] label for now, as it looks like this is behavior is not caused by Copilot logic. Please let me know if you disagree!
Description:
When I turned on observability (x-ray tracing) the number of tasks in my cluster scaled up and maxed out (I set a maximum of 10) with CPU at 100% and memory in the 80% range. This was in an instance which had no active users at the time. When I disabled tracing, it dropped back down to 1 task, ~2% CPU and 15% memory. No code was changed so we're able to isolate that its an issue with the xray tracing daemon.
It was enabled with the following in my manifest file:
Details:
Observed result:
At the time, as mentioned above, there was no traffic. The logs showed activity from the tracing daemon which made us suspect that it was the root cause. Removing the service immediately corrected the issue. Unfortunately after removing the service we lost the tracing logs.