aws-observability / aws-otel-collector

AWS Distro for OpenTelemetry Collector (see ADOT Roadmap at https://github.com/orgs/aws-observability/projects/4)
https://aws-otel.github.io/
Other
573 stars 239 forks source link

How to get a unique `instance` label per task on ECS Fargate #1479

Closed benmurden closed 2 years ago

benmurden commented 2 years ago

How can we label metrics with unique instance names when running on ECS Fargate with ADOT as a sidecar?

Each task in a service will use the same target config (e.g. 0.0.0.0:8080), meaning that the metrics from all instances are mixed when they reach Prometheus - there are no labels that separate them.

Environment Using the following ADOT config, deploy ADOT as a sidecar to an ECS Fargate service which scales to at least two tasks and provides its own metrics.

receivers:
  prometheus:
    config:
      global:
        scrape_interval: 1m
        scrape_timeout: 10s
      scrape_configs:
        - job_name: $PROMETHEUS_JOB_NAME
          metrics_path: $PROMETHEUS_METRICS_PATH
          static_configs:
            - targets: [$PROMETHEUS_TARGET]
  awsecscontainermetrics:
    collection_interval: 1m

processors:
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.memory.utilized
          - ecs.task.memory.reserved
          - ecs.task.cpu.utilized
          - ecs.task.cpu.reserved
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes

exporters:
  prometheusremotewrite:
    endpoint: $AMP_ENDPOINT
    auth:
      authenticator: sigv4auth
    resource_to_telemetry_conversion:
      enabled: true
  logging:
    loglevel: info
    sampling_initial: 5
    sampling_thereafter: 200
extensions:
  health_check:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679
  sigv4auth:
    region: $REGION

service:
  extensions: [pprof, zpages, health_check, sigv4auth]
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [logging, prometheusremotewrite]
    metrics/ecs:
      receivers: [awsecscontainermetrics]
      processors: [filter]
      exporters: [logging, prometheusremotewrite]

What did you expect to see? Different instance labels per task on ECS Fargate.

Additional context Note that the awsecscontainermetrics have distinct labels per task when resource_to_telemetry_conversion is enabled, this is not the problem. We are specifically asking about a way of labelling metrics - that come from applications we don't control - according to their task so they can be distinguished from one another.

We believe one way this might be solved is to deploy ADOT as a separate service and collect metrics using Service Discovery instead, which would give us unique IP addresses. However, we're interested in knowing if we have missed something with the sidecar setup.

goriparthivamsi commented 2 years ago

Try adding processors in your otel config. processors: resourcedetection/ecs: detectors: [env,ecs] override: false timeout: 2s

benmurden commented 2 years ago

That looks like it would do the trick. https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md#amazon-ecs

Queries the Task Metadata Endpoint (TMDE) to record information about the current ECS Task.

Thanks for pointing me in the right direction!

donkee commented 9 months ago

Sorry to comment on an old issue, but I've added the processor and it's not working as I expected. It successfully sends my ECS instance info to Prometheus, but it only exists in the target_info metric. How can I get that info to exist in all my metrics so I can use it in my queries? Can I also add custom labels/kv-pairs when using the processor?

donkee commented 9 months ago

I figured out i need

resource_to_telemetry_conversion:
      enabled: true

in my prometheusremotewrite exporter.