apptality / aws-ecs-cloudmap-prometheus-sd

A comprehensive solution for dynamic Prometheus service discovery in AWS ECS using OpenTelemetry. This tool enables HTTP-based discovery of ECS tasks and CloudMap services, allowing for flexible filtering, label management, and scalable monitoring. Ideal for teams leveraging AWS ECS.
https://medium.com/@alxtsbk/scraping-prometheus-metrics-from-aws-ecs-9c8d9a1ca1bd
MIT License
5 stars 1 forks source link
aws cloudmap docker dotnetcore ecs fargate monitoring prometheus

AWS ECS & CloudMap Prometheus Discovery

Build License: MIT Docker Image Version Docker Image Size (tag) Medium

Overview

This application facilitates discovery of ECS and/or CloudMap resources and compatible with Prometheus HTTP Service Discovery.

Application leverages AWS API to dynamically discover:

making it easier for Prometheus to monitor these services without hardcoding their addresses and ports, or using any kind of file-based discovery (thus can be run as a standalone application: as it's own AWS ECS Service, for example).

AWS ECS & CloudMap Prometheus Discovery

Features

Below is a high level overview of what application is capable of. For the full list of supported configuration parameters and how to override these using Docker Environment Variables please refer to appsettings.json

At least one of EcsClusters or CloudMapNamespaces must be provided for application to work. If you would like to include both Service Connect and Service Discovery targets, CloudMapNamespaces must be specified (with or without EcsClusters). When both parameters are specified, the end result is only an intersection of targets that exist in ECS clusters and CloudMap namespaces provided.

Permissions: IAM permissions are required to discover ECS clusters (ecs:Get*, ecs:List*, ecs:Describe*) and CloudMap namespaces (servicediscovery:Get*, servicediscovery:List*, servicediscovery:Discover*, route53:Get*)

Usage Example

For the full example of running this in AWS, please navigate to /example folder. Integrates with OpenTelemetry receivers config.

Refer to appsettings.json for all supported configuration options.

Example run command:

  docker run --rm \
    # ** REQUIRED PARAMETERS **
    # At least one of "EcsClusters" or "CloudMapNamespaces" must be provided.
    -e DiscoveryOptions__EcsClusters="<cluster1>;<cluster2>" \
    -e DiscoveryOptions__CloudMapNamespaces="<namespace1>;" \
    # If running outside of AWS - pass credentials explicitly
    # If running in AWS - use Task IAM Role.
    # IMPORTANT: Credentials need to have permissions to describe ECS cluster(optionally CloudMap namespaces)
    -e AWS_REGION="us-west-1" \
    -e AWS_ACCESS_KEY_ID="<key>" \
    -e AWS_SECRET_ACCESS_KEY="<secret>" \
    -e AWS_SESSION_TOKEN="<session_token>" \
    # ** OPTIONAL PARAMETERS **
    # Will only include those services, which have 'prom_scrape_target' tag set to 'yes'.
    # Leave blank to include all services in cluster(s)
    -e DiscoveryOptions__EcsServiceSelectorTags="prom_scrape_target=yes;" \
    # Same as above, but for filtering out CloudMap services
    # Leave blank ot include all services in namespace(s)
    -e DiscoveryOptions__CloudMapServiceSelectorTags="prom_scrape_target=yes;" \
    # Semicolon separated string of tag keys to include in the service discovery response as metadata.
    # Supports glob pattern matching using * and ?.
    -e DiscoveryOptions__EcsTaskTags="*" \
    -e DiscoveryOptions__EcsServiceTags="*" \
    -e DiscoveryOptions__CloudMapServiceTags="AmazonECS*;" \
    -e DiscoveryOptions__CloudMapNamespaceTags="*" \
    # Semicolon separated string of labels to include in the service discovery response as metadata.
    # Will be added to all discovered targets.
    -e DiscoveryOptions__ExtraPrometheusLabels="custom_static_tag=my-static-tag;" \
    # Tag prefix to identify metrics port, [path | "/metrics"], [name | ""] triplets.
    # Please refer to the configuration options to learn more.
    -e DiscoveryOptions__MetricsPathPortTagPrefix="METRICS_" \
    # Add new or modify existing labels in the response using token replacements,
    # to prevent the need of modifying your Grafana dashboards.
    # Please refer to the configuration options to learn more.
    -e DiscoveryOptions__RelabelConfigurations="cluster_and_service={{_sys_ecs_cluster}}-{{_sys_ecs_service}}" \
    # Instructs .NET application to listen on 9001 inside the container
    -e ASPNETCORE_URLS="http://*:9O01" \
    # ** DOCKER **
    -p 9001:9001 \
    apptality/aws-ecs-cloudmap-prometheus-discovery:latest

Example output:

[
  {
    "targets": [
      "10.200.10.200:8080"
    ],
    "labels": {
      "__metrics_path__": "/metrics",
      "instance": "10.200.10.200",
      "scrape_target_name": "app",
      "__meta_cloudmap_service_instance_id": "c88nc14799fa46d794c1899612061h3s",
      "__meta_cloudmap_service_name": "service-app",
      "__meta_cloudmap_service_type": "ServiceConnect",
      "__meta_ecs_cluster": "my-ecs-cluster",
      "__meta_ecs_service": "my-fargate-application",
      "__meta_ecs_task": "arn:aws:ecs:us-west-2:123456789012:task/my-ecs-cluster/c88nc14799fa46d794c1899612061h3s",
      "__meta_ecs_task_definition": "arn:aws:ecs:us-west-2:123456789012:task-definition/my-fargate-application:2",
      "_sys_cloudmap_service_instance_id": "c88nc14799fa46d794c1899612061h3s",
      "_sys_cloudmap_service_name": "my-fargate-application",
      "_sys_cloudmap_service_type": "ServiceConnect",
      "_sys_ecs_cluster": "my-ecs-cluster",
      "_sys_ecs_service": "my-fargate-application",
      "_sys_ecs_task": "arn:aws:ecs:us-west-2:123456789012:task/my-ecs-cluster/c88nc14799fa46d794c1899612061h3s",
      "_sys_ecs_task_definition": "arn:aws:ecs:us-west-2:123456789012:task-definition/my-fargate-application:2",
      "prom_scrape_target": "yes",
      "AmazonECSManaged": "true",
      "custom_static_tag": "my-static-tag",
      "cluster_and_service": "my-ecs-cluster-my-fargate-application"
    }
  },
  ...
]

Response Structure

Response is returned in HTTP_SD format compatible format:

[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "<labelname>": "<labelvalue>", ...
    }
  },
  ...
]

Success

Success response is returned with application/json HTTP Content Type, and 200 HTTP status code.

Some clarification on labels returned:

Anything else returned is either inferred from AWS Resource Tags specified via selectors (EcsTaskTags, EcsServiceTags, CloudMapServiceTags, CloudMapNamespaceTags), supplied via ExtraPrometheusLabels, or product of RelabelConfigurations configurations.

Please note, that tags are resolved from AWS resource in the following order:

ECS Task > ECS Service > CloudMap Service > CloudMap Namespace

meaning that if ECS Service has tag "MyCustomTag=EcsService" and CloudMap Service has the same tag, but with different value: "MyCustomTag=CloudMap" - the resulting scrape target label will have value of the former:

[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "MyCustomTag": "EcsService", ...
    }
  },
  ...
]

Error

When application runs into an error, response is returned with 500 HTTP status code:

{
  "type": "https://tools.ietf.org/html/rfc9110#section-15.6.1",
  "title": "An error occurred while processing your request.",
  "status": 500
}

You'll need to investigate server logs for exception details:

2024-08-20 21:22:23.242 -03:00 [ERR] HTTP GET /prometheus-targets responded 500 in 99.9711 ms
2024-08-20 21:22:23.243 -03:00 [ERR] An unhandled exception has occurred while executing the request.
Microsoft.Extensions.Options.OptionsValidationException: At least one of 'EcsClusters' or 'CloudMapNamespaces' name must be specified.
   at Microsoft.Extensions.Options.OptionsFactory`1.Create(String name)
...

Storing this image in ECR

~/.scripts/dockerhub-to-ecr.sh script is intended for users who need control over their Docker images in AWS ECR, allowing to pull an image from DockerHub, re-tag it, and push it to an ECR repository with customizable options for platform and tagging.

Run the script specifying your ECR URL as the first positional parameter:

# <paste in your AWS credential>

cd ./scripts

chmod +x dockerhub-to-ecr.sh

./dockerhub-to-ecr.sh <target_ecr_repository_url> \
    # Defaults to 'apptality/aws-ecs-cloudmap-prometheus-discovery:latest'
    [source_dockerhub_image_url] \
    # Specify explicit value of image tag you want to label your ECR image with.
    # By default, same tag as on source image will be applied.
    [target_ecr_repository_tag (Default: same as source)] \
    # Specify docker platform to re-push. Amd64/Arm64 available.
    [docker_image_platform: 'linux/amd64' (Default) | 'linux/arm64']

Example:

./dockerhub-to-ecr.sh 123456789012.dkr.ecr.us-west-2.amazonaws.com/tools/ecs-sd

The above script will:

  1. Authenticate with AWS ECR region us-west-2
  2. Pull linux/amd64 platform of apptality/aws-ecs-cloudmap-prometheus-discovery:latest locally
  3. Re-tag it to 123456789012.dkr.ecr.us-west-2.amazonaws.com/tools/ecs-sd:latest
  4. Push 123456789012.dkr.ecr.us-west-2.amazonaws.com/tools/ecs-sd:latest to ECR

Positional Parameters

Note: you need to create destination ECR repository first.

References

This application is heavily influenced by the following articles:

Definitely suggest reading up on what is AWS Distro for OpenTelemetry (ADOT), as it provides open source APIs, libraries, and agents to collect logs, metrics, and traces from your applications.

For how to integrate AWS AMP (Prometheus) with your Grafana please refer to Set up Grafana open source or Grafana Enterprise for use with Amazon Managed Service for Prometheus article by AWS.

Alternatives

There are definitely some discovery tools that are alternative to current application:

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues to discuss potential improvements or features.

License

This project is licensed under the MIT License - see the LICENSE file for details.