aws / amazon-ecs-service-connect-agent

Amazon ECS Service Connect Agent
Apache License 2.0
27 stars 10 forks source link

Filter Envoy stats query with appmesh string #45

Closed suniltheta closed 1 year ago

suniltheta commented 1 year ago

Summary

Include filter=appmesh while querying for stats endpoint

Implementation details

Include filter=appmesh query param

Testing

New tests cover the changes: yes

Description for the changelog

Include filter=appmesh while querying for stats endpoint

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

karanvasnani commented 1 year ago

When I last looked at this there were some combinations of filters I remember were not supported. Can you attach a stats dump with and without this change that confirms the function of both usedonly and filter=appmesh in tandem?

ytsssun commented 1 year ago

When I last looked at this there were some combinations of filters I remember were not supported. Can you attach a stats dump with and without this change that confirms the function of both usedonly and filter=appmesh in tandem?

+1 can you confirm the the behavior here? Also wanted to see whether this would impact the metrics filtering logic we have today (we support filter=metrics_extension, which I manually filtered the metrics so that it only contains appmesh metrics).

suniltheta commented 1 year ago

When I last looked at this there were some combinations of filters I remember were not supported.

In our internal discussion we noted that using 9901/stats/prometheus?filter=appmesh&usedonly doesn't work but doing 9901/stats?format=prometheus&filter=appmesh&usedonly does work. But when I checked on v1.27.0.0 both these queries worked. I couldn't find what fixed in Envoy upstream to make the former one work. Since agent is anyway doing 9901/stats?format=prometheus&usedonly for actually querying stats form Envoy I just included filter=appmesh in addition.

Can you attach a stats dump with and without this change that confirms the function of both usedonly and filter=appmesh in tandem?

These are the bunch of testing that I did.

curl -s --unix-socket /tmp/envoy_admin.sock "http://unix/stats/prometheus?filter=appmesh&usedonly" > /var/log/stats1.txt
curl -s --unix-socket /tmp/envoy_admin.sock "http://unix/stats/prometheus?filter=appmesh" > /var/log/stats2.txt
curl -s --unix-socket /tmp/envoy_admin.sock "http://unix/stats/prometheus?usedonly" > /var/log/stats3.txt
curl -s --unix-socket /tmp/envoy_admin.sock "http://unix/stats?format=prometheus&usedonly&filter=appmesh" > /var/log/stats4.txt
curl -s --unix-socket /tmp/envoy_admin.sock "http://unix/stats?format=prometheus&filter=appmesh&usedonly" > /var/log/stats5.txt
curl -s --unix-socket /tmp/envoy_admin.sock "http://unix/stats?format=prometheus&usedonly" > /var/log/stats6.txt
curl -s --unix-socket /tmp/envoy_admin.sock "http://unix/stats?format=prometheus&filter=appmesh" > /var/log/stats7.txt
curl -s --unix-socket /tmp/envoy_admin.sock "http://unix/stats?format=prometheus" > /var/log/stats8.txt
curl -s --unix-socket /tmp/envoy_admin.sock "http://unix/stats/prometheus" > /var/log/stats9.txt
bash-4.2$ ls -lrth stats*
-rw-r--r-- 1 20000 root 6.1K Aug 29 05:06 stats1.txt
-rw-r--r-- 1 20000 root 7.0K Aug 29 05:06 stats2.txt
-rw-r--r-- 1 20000 root 101K Aug 29 05:06 stats3.txt
-rw-r--r-- 1 20000 root 6.1K Aug 29 05:06 stats4.txt
-rw-r--r-- 1 20000 root 6.1K Aug 29 05:06 stats5.txt
-rw-r--r-- 1 20000 root 101K Aug 29 05:06 stats6.txt
-rw-r--r-- 1 20000 root 7.0K Aug 29 05:06 stats7.txt
-rw-r--r-- 1 20000 root 204K Aug 29 05:07 stats8.txt
-rw-r--r-- 1 20000 root 204K Aug 29 05:07 stats9.txt

stats1.txt stats2.txt stats3.txt stats4.txt stats5.txt stats6.txt stats7.txt stats8.txt stats9.txt delta_query_with_fix.txt delta_query_without_fix.txt


suniltheta commented 1 year ago

Also wanted to see whether this would impact the metrics filtering logic we have today (we support filter=metrics_extension, which I manually filtered the metrics so that it only contains appmesh metrics).

The only query which ECS Agent makes that needs optimization https://github.com/aws/amazon-ecs-agent/blob/82280d81c0b7e48b6c4d9ddca9f6670d3888d787/agent/engine/serviceconnect/manager_linux.go#L78

In last 2 commits on this PR addressed a bug (from first commit) where we don't want to include filter=appmesh by default. we would be also using agent to query plain /stats/prometheus to check for listener stats (found this by running integ tests).

Now we include filter=appmesh only when the request to Agent includes filter=metrics_extension. While in manual filtering we look for string "envoyappmesh", all of those should be captured with filter=appmesh.

We can still kept the logic to confirm that we are only picking the metrics with prefix in L174 extendedMetricsPrefix="envoy_appmesh_", just in case.

https://github.com/aws/amazon-ecs-service-connect-agent/blob/5e22678921bcc7f32ec39f6fd2e3af503cd98463/agent/stats/envoy_prometheus_stats.go#L174