Spaced-Out / ecs-container-exporter

AWS ECS side car that exports ECS container level docker stats metrics to Prometheus as well as publish it via Statsd.
MIT License
26 stars 3 forks source link

ECS Fargate CPU usage calculated incorrectly? #6

Open marksumm opened 2 years ago

marksumm commented 2 years ago

I did some experiments running instances of containerstack/cpustress in ECS Fargate on Linux with a task CPU limit of 1 vCPU (1024 shares). The instances were configured to always use 100% of the available CPU resources, by running with arguments --cpu=2 to start 2 worker threads and --timeout=86400 to prevent early termination. I've noticed some strange behaviour from the exporter...

1) ecs_task_cpu_limit is returned correctly in CPU shares (1024)

2) ecs_task_cpu_usage_ratio is correctly scaled between 0 and 1 when exported_container_name="task"

3) ecs_task_cpu_usage_ratio is returned in CPU shares and not scaled between 0 and 1 when exported_container_name != "task". Also, it seems to be out by a factor of 2.

marksumm commented 2 years ago

I think I've found the cause of the issues: The container CPU limit is being returned as 2 (presumably due to default Linux cgroup behaviour) when it's actually 0 in the ECS task definition. Causes normalize_cpu_usage to apply case 4 scaling, when case 3 is expected.

marksumm commented 2 years ago

After experimenting with different CPU limits on individual containers, I've realised that the values returned for ecs_task_percpu_usage_ratio are inexplicable. I would expect them to be affected by the same issues as ecs_task_cpu_usage_ratio and also to sum to the same value, but it appears not to be the case.