aws / aws-app-mesh-roadmap

AWS App Mesh is a service mesh that you can use with your microservices to manage service to service communication
Apache License 2.0
347 stars 25 forks source link

Feature Request: Managed CloudWatch metrics based on Envoy stats #61

Open rclark opened 5 years ago

rclark commented 5 years ago

Tell us about your request

I would like to see app-mesh provide some level of out-of-the-box integration with CloudWatch. This would be an extremely useful "value-add" to present to teams looking into adopting app-mesh for their application architecture.

Which integration(s) is this request for?

Ideally, this would cover any of the potential integrations, since its based on collection from Envoy stats, which are consistent across integrations.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Currently envoy containers can be configured to output DogsStatsD compatible metrics, but documentation provides us with no instruction for using that to begin accumulating metrics in CloudWatch.

Outside of using app-mesh, I have in the past added cloudwatch agent sidecar containers to my ECS tasks in awsvpc networking mode and configured envoy to send metrics to it. However this is non-trivial, as the cw-agent is not well-designed for running in a docker container. Getting this to work involved reverse-engineering various shell scripts involved in configuring the agent.

Furthermore, learning the range of statistics emitted by Envoy and reducing them to metrics that you're interested in represents another undocumented (by AWS) learning curve.

I believe that the team should make some opinionated decisions about metrics that would be automatically aggregated to CloudWatch from envoy containers in your mesh. Further documentation about how to configure those metrics to meet your needs should also be included. If the implementation demands the customer to run the cloudwatch agent, then that application needs to be supported in each of app-mesh's integration scenarios (including ECS and EKS).

Are you currently working around this issue?

We are only in the prototyping stages of using app-mesh. Mostly I see this as a hindrance to adoption. If one of the primary value-adds of app-mesh is that it provides enhanced network-layer visibility, then the service ought to present that functionality by default.

Thanks!

kiranmeduri commented 5 years ago

@rclark are you using https://github.com/aws-samples/amazon-cloudwatch-container-insights/tree/master/cloudwatch-agent-dockerfile to build cloudwatch-agent docker image?

However, I still see the following problems using it in the context of App Mesh with Fargate/ECS. We want cwagent to be started before Envoy and hence would like to use DependsOn configuration on Envoy container. This means task's network namespace need to ignore traffic from cwagent (iptables rules), i.e. set UID:1337. cwagent container creates config files on startup and fails with permission denied.

2019/10/11 14:24:46 Failed to create the configuration validation file. Reason: open /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml: permission denied 

To get it to work I had to create a custom image using the following Dockerfile, similar to how Istio sets it up https://github.com/istio/istio/blob/master/docker

FROM debian:latest as build

RUN apt-get update &&  \
    apt-get install -y ca-certificates curl && \
    rm -rf /var/lib/apt/lists/*

RUN curl -O https://s3.amazonaws.com/amazoncloudwatch-agent/debian/amd64/latest/amazon-cloudwatch-agent.deb && \
    dpkg -i -E amazon-cloudwatch-agent.deb && \
    rm -rf /tmp/* && \
    rm -rf /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard && \
    rm -rf /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl && \
    rm -rf /opt/aws/amazon-cloudwatch-agent/bin/config-downloader

# NOTICE: copied from https://github.com/istio/istio/blob/master/docker/Dockerfile.base
# Change ownership to allow agent to write generated files
RUN useradd -m --uid 1337 sidecar-agent && \
    echo "sidecar-agent ALL=NOPASSWD: ALL" >> /etc/sudoers && \
    chown -R sidecar-agent /opt/aws/amazon-cloudwatch-agent

FROM scratch

COPY --from=build /tmp /tmp
COPY --from=build /etc/passwd /etc/passwd
COPY --from=build /etc/sudoers /etc/sudoers
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=build /opt/aws/amazon-cloudwatch-agent /opt/aws/amazon-cloudwatch-agent
COPY cwagentconfig /etc/cwagentconfig

USER sidecar-agent

ENV RUN_IN_CONTAINER="True"
ENTRYPOINT ["/opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent"]

This gets cwagent to work as a sidecar and forward metrics from Envoy configured to publish statsd.

         Environment:
             - Name: 'ENABLE_ENVOY_XRAY_TRACING'
              Value: '1'
            - Name: 'ENABLE_ENVOY_STATS_TAGS'
              Value: '1'
            - Name: 'ENABLE_ENVOY_DOG_STATSD'
              Value: '1'
            - Name: 'APPMESH_VIRTUAL_NODE_NAME'
              Value:

This still leaves me to build a dashboard that is not simple to do and requires Envoy understanding.

kiranmeduri commented 5 years ago

Here are some action items to improve this

lavignes commented 4 years ago

@kiranmeduri btw. The json/toml config for statsd collection can just be passed to the cwagent image as an environment variable. There is no need to a make custom image.

I started making an example that uses the cloudwatch agent and xray to get data to show up CloudWatch ServiceLens: https://github.com/lavignes/aws-app-mesh-examples/blob/service-lens/walkthroughs/howto-service-lens/app.yaml#L219

kiranmeduri commented 4 years ago

@lavignes i believe you still need to create image if you want to run cwagent as uid 1337. Otherwise traffic from cwagent have to go via Envoy and that may not be desirable if we want to monitor Envoy. Can you check if you can set uid to 1337 in your container def? Thanks

nbrandaleone commented 4 years ago

Hi. I am working on a blog post that shows how to view Envoy stats in CloudWatch. It is still in draft form, but you can check it out here: http://www.nickaws.net/aws/service_mesh/2019/12/29/AppMesh-Visibility.html

rclark commented 4 years ago

I appreciate that there are ways to collect envoy statistics as CloudWatch metrics, and @nbrandaleone your blog post looks super helpful towards that implementation.

But just to reiterate the key point of my original request: I shouldn't have to do this. App Mesh should be able to provide out-of-the-box metrics that provide me with a level of observability that I don't get by connecting a set of ECS services and load balancers.

I have in the past added cloudwatch agent sidecar containers to my ECS tasks in awsvpc networking mode and configured envoy to send metrics to it. However this is non-trivial...

Furthermore, learning the range of statistics emitted by Envoy and reducing them to metrics that you're interested in represents another undocumented (by AWS) learning curve.

I believe that the team should make some opinionated decisions about metrics that would be automatically aggregated to CloudWatch from envoy containers in your mesh.

lavignes commented 4 years ago

@rclark thanks for the input. There is absolutely a learning curve here that is non-trivial. I think some of the action items that @kiranmeduri listed above would move us a lot closer to what many people need. One-click options for setting up a cwagent and generating opinionated dashboards are absolutely something that App Mesh should provide.

mchene commented 4 years ago

Hey everyone, I’m a Product Manager for CloudWatch. We are looking for people to join our beta program to provide feedback and test App Mesh monitoring and troubleshooting as part of CloudWatch Container Insights. The beta program will allow you to test the collection and visualization of Prometheus metrics from Envoy. We are starting with Kubernetes. Email me if interested.

bcelenza commented 4 years ago

@mchene Could you provide an email address?

mchene commented 4 years ago

machene@amazon.com! No spamming! :)

mkielar commented 3 years ago

Oh boy, so much time passed, and this is still not adressed in any way? I just tried using the latest-greatest public.ecr.aws/cloudwatch-agent/cloudwatch-agent:1.247349.0b251399 today, setting the userid in my ECS Container to 1337, and it still failed with:

2021/09/23 12:52:17 Failed to create the configuration validation file. Reason: open /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml: permission denied

Is there seriously no official Cloudwatch Agent image that would handle Appmesh out-of-the-box?

Also, duplicate: https://github.com/aws/aws-app-mesh-roadmap/issues/122

kevinten10 commented 2 years ago

Hi, is there any progress?

mkielar commented 2 years ago

For anyone still waiting for official cloudwatch image that works with AppMesh/Envoy: Consider migrating to AWS OpenTellemetry Collector Sidecar. Here's an example configuration: https://github.com/aws/aws-app-mesh-examples/blob/main/walkthroughs/howto-metrics-extension-ecs/README.md#optional-filtering-metrics-with-the-aws-distro-for-opentelemetry

The additional advantage is you can filter StatsD metrics similarly to how Cloudwatch filters Prometheus, while still being able to process histogram metrics (like latency) that Cloudwatch still cannot handle when scrapping Prometheus.