aws-observability / aws-otel-community

Welcome to the AWS Distro for OpenTelemetry project. If you're using monitoring and observability tools for AWS products and services, this is a great place to ask questions, request features and network with other community members.
https://aws-otel.github.io/
Apache License 2.0
93 stars 93 forks source link

Could ADOT collect metrics from a Spring Boot metrics REST endpoint? #467

Open codewithabhi17 opened 1 year ago

codewithabhi17 commented 1 year ago

Hello everyone.

I have read the aws-observability, and it contains some receivers (e.g. awsecscontainermetricsreceiver , awsxrayreceiver, awscontainerinsightreceiver).

I want to collect metrics from my custom spring boot application endpoint by using OTEL collector technology , the data follow would be like

spring boot app exposes an REST API (metrics) -> OTEL receiver (e. g. ADOT receiver, Prometheus receiver)-> OTEL processor -> OTEL exporter

Eventually we want AMP to scrape this custom metrics an AMG to display the results.

mhausenblas commented 1 year ago

Yes, in principle that's doable. Can you share more details about the metrics your app exposes? Are those Prometheus metrics?

codewithabhi17 commented 1 year ago

yes that is correct. Spring basically exposes metrics in a format which prometheus can understand. It is called an actuator endpoint. More details can be found here. https://docs.spring.io/spring-boot/docs/current/reference/html/actuator.html

I tried this adot config to acheive this but it only scrapes the ecs container metrics but not my spring application metrics.

receivers:
  prometheus:
    config:
      global:
        scrape_interval: 15s
        scrape_timeout: 10s
      scrape_configs:
      - job_name: "prometheus"
        static_configs:
        - targets: [ 0.0.0.0:9090 ]
      - job_name: "dos-dev-api"
        metrics_path: "dos/api/metrics"
        static_configs:
          - targets: [ 0.0.0.0:8080 ]

  awsecscontainermetrics:
    collection_interval: 10s
processors:
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.memory.utilized
          - ecs.task.memory.reserved
          - ecs.task.cpu.utilized
          - ecs.task.cpu.reserved
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes
exporters:
  prometheusremotewrite:
    endpoint: https://xxxx
    auth:
      authenticator: sigv4auth
  logging:
    loglevel: info
extensions:
  health_check:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679
  sigv4auth:
    region: us-west-2
    service: aps
    assume_role:
      arn: arn:aws:iam::xxx
      sts_region: eu-west-2
service:
  extensions: [pprof, zpages, health_check, sigv4auth]
  telemetry:
    logs:
      level: debug
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [logging, prometheusremotewrite]
    metrics/ecs:
      receivers: [awsecscontainermetrics]
      processors: [filter]
      exporters: [logging, prometheusremotewrite]

Error:

debug scrape/scrape.go:1353 Scrape failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "dos-dev-api", "target": "http://0.0.0.0:8080/dos/api/metrics", "error": "server returned HTTP status 404 ", "errorVerbose": "server returned HTTP status 404 \ngithub.com/prometheus/prometheus/scrape.
mhausenblas commented 1 year ago

It appears to be the same question as on StackOverflow.

codewithabhi17 commented 1 year ago

yes. any suggestion on my problem statement? Does the config look ok to you? Thanks

mhausenblas commented 1 year ago

This was a note to have the conversation in one place, with all the information necessary (for example, at SO you're providing details about the compute, ECS) and here not, which makes it impossible to answer the question properly.

mhausenblas commented 1 year ago

Answered it on SO, closing this one.

codewithabhi17 commented 1 year ago

Thanks for answering my question on SO and sorry for the confusion. I tried your suggestion but ADOT is still unable to scrape the metrics from a custom prometheus endpoint. However it is able to scrape ecs fargate container metrics as show by you. Since i cant update SO with my latest configuration. I am attaching the config here again for your reference.

extensions:
  health_check:
  sigv4auth:
    region: us-west-2
    service: aps
    assume_role:
      arn: XXXX
      sts_region: eu-west-2

receivers:
  awsecscontainermetrics:
    collection_interval: 10s
  prometheus:
    config:
      global:
        scrape_interval: 20s
        scrape_timeout: 10s
      scrape_configs:
        - job_name: "otel-collector"
          metrics_path: "dos/api/metrics"
          static_configs:
            - targets: [localhost:8080]

processors:
  batch/metrics:
    timeout: 60s
  resourcedetection:
    detectors:
      - env
      - system
      - ecs
      - ec2
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.memory.reserved
          - ecs.task.memory.utilized
          - ecs.task.cpu.reserved
          - ecs.task.cpu.utilized
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes
          - container.duration

exporters:
  prometheusremotewrite:
    endpoint: XXX
    auth:
      authenticator: sigv4auth
  logging:
    loglevel: info
#    resource_to_telemetry_conversion:
#      enabled: true

service:
  telemetry:
    logs:
      level: debug
  pipelines:
    metrics/application:
      receivers: [prometheus]
      processors: [resourcedetection, batch/metrics]
      exporters: [prometheusremotewrite]
    metrics:
      receivers: [awsecscontainermetrics]
      processors: [filter]
      exporters: [prometheusremotewrite]

  extensions: [health_check, sigv4auth]

task definition:

[
  {
    "name": "${task_definition_name}",
    "image": "${docker_image_url}",
    "essential": true,
    "linuxParameters": {
      "capabilities": {
        "add": [
          "SYS_PTRACE"
        ]
      }
    },
    "environmentFiles": [
      {
        "value": "XXX",
        "type": "s3"
      }
    ],
    "portMappings": [{
      "containerPort": 8080
    }],
    "essential": true,
    "entryPoint": [
      "${start_app_script}"
    ],
    "environment": [
      {
        "name": "url",
        "value": "${web_url}"
      }
    ],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "${dos_api_log_group}",
        "awslogs-region": "${region}",
        "awslogs-stream-prefix": "${tos_api_log_group}"
      }
    },
    "secrets": [
      {
        "name": "DB_MASTER_PASSWORD",
        "valueFrom": "${DATABASE_MASTER_PASSWORD_ARN}"
      }
    ]
  },
  {
    "name": "adot-collector",
    "image": "public.ecr.aws/aws-observability/aws-otel-collector:v0.29.1",
    "essential": true,
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "${adot_sidecar_log_group_name}",
        "awslogs-region": "${region}",
        "awslogs-stream-prefix": "${adot_sidecar_log_group_name}"
      }
    }
  },
  {
    "name": "load-gen",
    "image": "public.ecr.aws/h0h9t7p1/alpine-bash-curl-jq:latest",
    "portMappings": [
      {
        "name": "load-gen-80-tcp",
        "containerPort": 80,
        "hostPort": 80,
        "protocol": "tcp",
        "appProtocol": "http"
      }
    ],
    "essential": true,
    "command": [
      "/bin/bash",
      "-c",
      "sleep 15; while : ; do curl -s -o /dev/null localhost:8080 ; sleep 1; done"
    ],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "${prometheus_log_group}",
        "awslogs-region": "${region}",
        "awslogs-stream-prefix": "${prometheus_log_group}"
      }
    }
  }
]

ERROR:

debug scrape/scrape.go:1353 Scrape failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "otel-collector", "target": "http://localhost:8080/dos/api/metrics", "error": "Get \"http://localhost:8080/tos/api/metrics\": dial tcp 127.0.0.1:8080: connect: connection refused"}

Want to make it clear one more time, ADOT is able to scrape ecs container metrics and i see it in AMG. But its not able to scrape metrics from "dos/api/metrics" Spring actutaor API in our Spring boot app.

And i suspect otel collector is not able to resolve "localhost" from the target in adot-config.yaml even though it is a side car container to my app container. I dont want to hardcode my ECS task ip address in the target because it can change with time.