Open rclark opened 5 years ago
@rclark are you using https://github.com/aws-samples/amazon-cloudwatch-container-insights/tree/master/cloudwatch-agent-dockerfile to build cloudwatch-agent docker image?
However, I still see the following problems using it in the context of App Mesh with Fargate/ECS. We want cwagent to be started before Envoy and hence would like to use DependsOn configuration on Envoy container. This means task's network namespace need to ignore traffic from cwagent (iptables rules), i.e. set UID:1337. cwagent container creates config files on startup and fails with permission denied.
2019/10/11 14:24:46 Failed to create the configuration validation file. Reason: open /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml: permission denied
To get it to work I had to create a custom image using the following Dockerfile, similar to how Istio sets it up https://github.com/istio/istio/blob/master/docker
FROM debian:latest as build
RUN apt-get update && \
apt-get install -y ca-certificates curl && \
rm -rf /var/lib/apt/lists/*
RUN curl -O https://s3.amazonaws.com/amazoncloudwatch-agent/debian/amd64/latest/amazon-cloudwatch-agent.deb && \
dpkg -i -E amazon-cloudwatch-agent.deb && \
rm -rf /tmp/* && \
rm -rf /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard && \
rm -rf /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl && \
rm -rf /opt/aws/amazon-cloudwatch-agent/bin/config-downloader
# NOTICE: copied from https://github.com/istio/istio/blob/master/docker/Dockerfile.base
# Change ownership to allow agent to write generated files
RUN useradd -m --uid 1337 sidecar-agent && \
echo "sidecar-agent ALL=NOPASSWD: ALL" >> /etc/sudoers && \
chown -R sidecar-agent /opt/aws/amazon-cloudwatch-agent
FROM scratch
COPY --from=build /tmp /tmp
COPY --from=build /etc/passwd /etc/passwd
COPY --from=build /etc/sudoers /etc/sudoers
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=build /opt/aws/amazon-cloudwatch-agent /opt/aws/amazon-cloudwatch-agent
COPY cwagentconfig /etc/cwagentconfig
USER sidecar-agent
ENV RUN_IN_CONTAINER="True"
ENTRYPOINT ["/opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent"]
This gets cwagent to work as a sidecar and forward metrics from Envoy configured to publish statsd.
Environment:
- Name: 'ENABLE_ENVOY_XRAY_TRACING'
Value: '1'
- Name: 'ENABLE_ENVOY_STATS_TAGS'
Value: '1'
- Name: 'ENABLE_ENVOY_DOG_STATSD'
Value: '1'
- Name: 'APPMESH_VIRTUAL_NODE_NAME'
Value:
This still leaves me to build a dashboard that is not simple to do and requires Envoy understanding.
Here are some action items to improve this
@kiranmeduri btw. The json/toml config for statsd collection can just be passed to the cwagent image as an environment variable. There is no need to a make custom image.
I started making an example that uses the cloudwatch agent and xray to get data to show up CloudWatch ServiceLens: https://github.com/lavignes/aws-app-mesh-examples/blob/service-lens/walkthroughs/howto-service-lens/app.yaml#L219
@lavignes i believe you still need to create image if you want to run cwagent as uid 1337. Otherwise traffic from cwagent have to go via Envoy and that may not be desirable if we want to monitor Envoy. Can you check if you can set uid to 1337 in your container def? Thanks
Hi. I am working on a blog post that shows how to view Envoy stats in CloudWatch. It is still in draft form, but you can check it out here: http://www.nickaws.net/aws/service_mesh/2019/12/29/AppMesh-Visibility.html
I appreciate that there are ways to collect envoy statistics as CloudWatch metrics, and @nbrandaleone your blog post looks super helpful towards that implementation.
But just to reiterate the key point of my original request: I shouldn't have to do this. App Mesh should be able to provide out-of-the-box metrics that provide me with a level of observability that I don't get by connecting a set of ECS services and load balancers.
I have in the past added cloudwatch agent sidecar containers to my ECS tasks in awsvpc networking mode and configured envoy to send metrics to it. However this is non-trivial...
Furthermore, learning the range of statistics emitted by Envoy and reducing them to metrics that you're interested in represents another undocumented (by AWS) learning curve.
I believe that the team should make some opinionated decisions about metrics that would be automatically aggregated to CloudWatch from envoy containers in your mesh.
@rclark thanks for the input. There is absolutely a learning curve here that is non-trivial. I think some of the action items that @kiranmeduri listed above would move us a lot closer to what many people need. One-click options for setting up a cwagent and generating opinionated dashboards are absolutely something that App Mesh should provide.
Hey everyone, I’m a Product Manager for CloudWatch. We are looking for people to join our beta program to provide feedback and test App Mesh monitoring and troubleshooting as part of CloudWatch Container Insights. The beta program will allow you to test the collection and visualization of Prometheus metrics from Envoy. We are starting with Kubernetes. Email me if interested.
@mchene Could you provide an email address?
machene@amazon.com! No spamming! :)
Oh boy, so much time passed, and this is still not adressed in any way?
I just tried using the latest-greatest public.ecr.aws/cloudwatch-agent/cloudwatch-agent:1.247349.0b251399
today, setting the userid in my ECS Container to 1337
, and it still failed with:
2021/09/23 12:52:17 Failed to create the configuration validation file. Reason: open /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml: permission denied
Is there seriously no official Cloudwatch Agent image that would handle Appmesh out-of-the-box?
Also, duplicate: https://github.com/aws/aws-app-mesh-roadmap/issues/122
Hi, is there any progress?
For anyone still waiting for official cloudwatch image that works with AppMesh/Envoy: Consider migrating to AWS OpenTellemetry Collector Sidecar. Here's an example configuration: https://github.com/aws/aws-app-mesh-examples/blob/main/walkthroughs/howto-metrics-extension-ecs/README.md#optional-filtering-metrics-with-the-aws-distro-for-opentelemetry
The additional advantage is you can filter StatsD metrics similarly to how Cloudwatch filters Prometheus, while still being able to process histogram metrics (like latency) that Cloudwatch still cannot handle when scrapping Prometheus.
Tell us about your request
I would like to see app-mesh provide some level of out-of-the-box integration with CloudWatch. This would be an extremely useful "value-add" to present to teams looking into adopting app-mesh for their application architecture.
Which integration(s) is this request for?
Ideally, this would cover any of the potential integrations, since its based on collection from Envoy stats, which are consistent across integrations.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Currently envoy containers can be configured to output DogsStatsD compatible metrics, but documentation provides us with no instruction for using that to begin accumulating metrics in CloudWatch.
Outside of using app-mesh, I have in the past added cloudwatch agent sidecar containers to my ECS tasks in
awsvpc
networking mode and configured envoy to send metrics to it. However this is non-trivial, as the cw-agent is not well-designed for running in a docker container. Getting this to work involved reverse-engineering various shell scripts involved in configuring the agent.Furthermore, learning the range of statistics emitted by Envoy and reducing them to metrics that you're interested in represents another undocumented (by AWS) learning curve.
I believe that the team should make some opinionated decisions about metrics that would be automatically aggregated to CloudWatch from envoy containers in your mesh. Further documentation about how to configure those metrics to meet your needs should also be included. If the implementation demands the customer to run the cloudwatch agent, then that application needs to be supported in each of app-mesh's integration scenarios (including ECS and EKS).
Are you currently working around this issue?
We are only in the prototyping stages of using app-mesh. Mostly I see this as a hindrance to adoption. If one of the primary value-adds of app-mesh is that it provides enhanced network-layer visibility, then the service ought to present that functionality by default.
Thanks!