aws-observability / aws-otel-collector

AWS Distro for OpenTelemetry Collector (see ADOT Roadmap at https://github.com/orgs/aws-observability/projects/4)
https://aws-otel.github.io/
Other
561 stars 236 forks source link

Why does ADOT Collector for prometheus is unable to create EKS Metrics and Container Insights in Cloudwatch? #2441

Closed jatinmehrotra closed 2 months ago

jatinmehrotra commented 10 months ago

Describe the question

I am trying to send Prometheus metrics scraped by the ADOT collector by Prometheus and send it to Cloudwatch instead of AMP.

Steps to reproduce if your question is related to an action

What did you expect to see?

Environment

#
# OpenTelemetry Collector configuration
# Metrics pipeline with Prometheus Receiver and Amazon CloudWatch EMF Exporter sending metrics to Amazon CloudWatch
#
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: my-collector-cloudwatch
spec:
  mode: deployment
  serviceAccount: adot-collector-sa
  podAnnotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '8888'
  resources:
    requests:
      cpu: "1"
    limits:
      cpu: "1"
  env:
    - name: CLUSTER_NAME
      value: my-custom-eks-cluster
  config: |
    receivers:
      #
      # Scrape configuration for the Prometheus Receiver
      # This is the same configuration used when Prometheus is installed using the community Helm chart
      #
      prometheus:
        config:
          global:
            scrape_interval: 10s
            scrape_timeout: 10s

          scrape_configs:
          - job_name: kubernetes-apiservers
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: default;kubernetes;https
              source_labels:
              - __meta_kubernetes_namespace
              - __meta_kubernetes_service_name
              - __meta_kubernetes_endpoint_port_name
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true

          - job_name: kubernetes-nodes
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$$1/proxy/metrics
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true

          - job_name: kubernetes-nodes-cadvisor
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true

          - job_name: kubernetes-service-endpoints
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
              replacement: __param_$$1
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: kubernetes_node

          - job_name: kubernetes-service-endpoints-slow
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
              replacement: __param_$$1
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: kubernetes_node
            scrape_interval: 5m
            scrape_timeout: 30s

          - job_name: prometheus-pushgateway
            kubernetes_sd_configs:
            - role: service
            relabel_configs:
            - action: keep
              regex: pushgateway
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_probe

          - job_name: kubernetes-services
            kubernetes_sd_configs:
            - role: service
            metrics_path: /probe
            params:
              module:
              - http_2xx
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_probe
            - source_labels:
              - __address__
              target_label: __param_target
            - replacement: blackbox
              target_label: __address__
            - source_labels:
              - __param_target
              target_label: instance
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name

          - job_name: eks-custom-service-monitoring
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: my-namespace;(9090|9121)
              source_labels:
              - __meta_kubernetes_namespace
              - __meta_kubernetes_pod_container_port_number
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_container_name
              target_label: container_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: pod_name

    processors:
      batch/metrics:
        timeout: 1s
        send_batch_size: 50
      #
      # Processor to transform the names of existing labels and/or add new labels to the metrics identified
      #
      metricstransform/labelling:
        transforms:
          - include: .*
            match_type: regexp
            action: update
            operations:
              - action: add_label
                new_label: EKS_Cluster
                new_value: ${CLUSTER_NAME}
              - action: update_label
                label: pod_name
                new_label: EKS_PodName
              - action: update_label
                label: container_name
                new_label: EKS_ContainerName
    exporters:
      #
      # AWS EMF exporter that sends metrics data as performance log events to Amazon CloudWatch
      # Only the metrics that were filtered out by the processors get to this stage of the pipeline
      # Under the metric_declarations field, add one or more sets of Amazon CloudWatch dimensions
      # Each dimension must alredy exist as a label on the Prometheus metric
      # For each set of dimensions, add a list of metrics under the metric_name_selectors field
      # Metrics names may be listed explicitly or using regular expressions
      # A default list of metrics has been provided
      # Data from performance log events will be aggregated by Amazon CloudWatch using these dimensions to create an Amazon CloudWatch custom metric
      #
      awsemf:
        region: us-east-1
        role_arn: arn:aws:iam::xxxxxxxxxx:role/eks-role-adot-prometheus-metric-write-cloudwatch-logs
        namespace: ContainerInsights/Prometheus
        log_group_name: '/aws/containerinsights/${CLUSTER_NAME}/prometheus'
        resource_to_telemetry_conversion:
          enabled: true
        dimension_rollup_option: "NoDimensionRollup"
        parse_json_encoded_attr_values: [Sources, kubernetes]
        metric_declarations:
          - dimensions: [[EKS_Cluster, EKS_Namespace, EKS_PodName]]
            metric_name_selectors:
              - apiserver_request_.*
              - container_memory_.*
              - container_threads
              - otelcol_process_.*
          - dimensions: [[]]
            metric_name_selectors:
              - __meta_kubernetes_namespace
          - dimensions: [[]]
            metric_name_selectors:
              - __meta_kubernetes_pod_container_port_number
    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [batch/metrics,metricstransform/labelling]
          exporters: [awsemf]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-prometheus-role
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
      - nodes/proxy
      - services
      - endpoints
      - pods
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - nonResourceURLs:
      - /metrics
    verbs:
      - get

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-prometheus-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: otel-prometheus-role
subjects:
  - kind: ServiceAccount
    name: adot-collector-sa
    namespace: default
{
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Resource": [
                "arn:aws:iam::xxxxxxxxx:role/eks-role-xray-remote-write-adot",
                "arn:aws:iam::xxxxxxxx:role/eks-role-amp-remote-write-adot",
                "arn:aws:iam::xxxxxxxxxx:role/eks-role-adot-prometheus-metric-write-cloudwatch-logs"
            ],
            "Sid": "assumeRoleToAmpRemoteWriteAdot"
        }
    ],
    "Version": "2012-10-17"
}

As per docs I have added Managed AWS Policy of CloudWatchAgentServerPolicy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:PutMetricData",
                "ec2:DescribeVolumes",
                "ec2:DescribeTags",
                "logs:PutLogEvents",
                "logs:DescribeLogStreams",
                "logs:DescribeLogGroups",
                "logs:CreateLogStream",
                "logs:CreateLogGroup"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:GetParameter"
            ],
            "Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*"
        }
    ]
}
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "allowRolesInWorkloadAccountsToAssumeRole",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Additional context

This configuration works for ADOT collector with AMP so my custom job for scraping and iam role permissions are not incorrect IMO.

mhausenblas commented 10 months ago

Thanks a lot for the details @jatinmehrotra and can you please share the collector logs, from what you shared it should work but without the collector logs it's hard to tell if the collection or ingestion part is not working.

jatinmehrotra commented 10 months ago

@mhausenblas

Thanks a lot for the details @jatinmehrotra and can you please share the collector logs, from what you shared it should work but without the collector logs it's hard to tell if the collection or ingestion part is not working.

By collector logs do you mean Collector pod logs or cloudwatch logs generated by the collector in the log stream?

mhausenblas commented 10 months ago

The collector pod logs

jatinmehrotra commented 10 months ago

@mhausenblas

Here are the collector pod logs

2023/10/31 08:32:46 ADOT Collector version: v0.32.0
2023/10/31 08:32:46 found no extra config, skip it, err: open /opt/aws/aws-otel-collector/etc/extracfg.txt: no such file or directory
2023/10/31 08:32:46 attn: users of the statsd receiver please refer to https://github.com/aws-observability/aws-otel-collector/issues/2249 in regards to an ADOT Collector v0.33.0 breaking change
2023-10-31T08:32:46.595Z    info    service/telemetry.go:84 Setting up own telemetry...
2023-10-31T08:32:46.596Z    info    service/telemetry.go:201    Serving Prometheus metrics  {"address": ":8888", "level": "Basic"}
2023-10-31T08:32:46.597Z    info    awsutil@v0.82.0/conn.go:256 STS Endpoint    {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "endpoint": "https://sts.us-east-1.amazonaws.com"}
2023-10-31T08:32:47.396Z    info    service/service.go:132  Starting aws-otel-collector...  {"Version": "v0.32.0", "NumCPU": 2}
2023-10-31T08:32:47.396Z    info    extensions/extensions.go:30 Starting extensions...
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-apiservers"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:239  Starting discovery manager  {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-nodes"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-nodes-cadvisor"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-service-endpoints"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-service-endpoints-slow"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "prometheus-pushgateway"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-services"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "eks-custom-service-monitoring"}
2023-10-31T08:32:47.396Z    info    kubernetes/kubernetes.go:326    Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "kubernetes", "config": "kubernetes-services"}
2023-10-31T08:32:47.396Z    info    kubernetes/kubernetes.go:326    Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "kubernetes", "config": "eks-custom-service-monitoring"}
2023-10-31T08:32:47.396Z    info    kubernetes/kubernetes.go:326    Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "kubernetes", "config": "kubernetes-apiservers"}
2023-10-31T08:32:47.397Z    info    kubernetes/kubernetes.go:326    Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "kubernetes", "config": "kubernetes-nodes"}
2023-10-31T08:32:47.397Z    info    service/service.go:149  Everything is ready. Begin running and processing data.
2023-10-31T08:32:47.397Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:278  Starting scrape manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2023-10-31T08:32:52.603Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.container.name":"frontend","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","k8s.pod.name":"frontend-545dcdccbc-zsxcm","k8s.pod.uid":"a55aaaff-73f3-423d-aa17-26236ef39511","k8s.replicaset.name":"frontend-545dcdccbc","net.host.name":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:54.235Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 225, "LogEventsSize": 216.4423828125, "Time": 868}
2023-10-31T08:32:54.235Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.container.name":"frontend","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","k8s.pod.name":"frontend-545dcdccbc-zsxcm","k8s.pod.uid":"a55aaaff-73f3-423d-aa17-26236ef39511","k8s.replicaset.name":"frontend-545dcdccbc","net.host.name":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:54.238Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.namespace.name":"default","net.host.name":"192.168.140.82","net.host.port":"443","service.instance.id":"192.168.140.82:443","service.name":"kubernetes-apiservers"}}
2023-10-31T08:32:54.678Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 539, "LogEventsSize": 255.9189453125, "Time": 416}
2023-10-31T08:32:54.980Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 545, "LogEventsSize": 255.7861328125, "Time": 295}
2023-10-31T08:32:55.159Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 9, "LogEventsSize": 3.953125, "Time": 179}
2023-10-31T08:32:55.181Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.namespace.name":"default","net.host.name":"192.168.140.82","net.host.port":"443","service.instance.id":"192.168.140.82:443","service.name":"kubernetes-apiservers"}}
2023-10-31T08:32:55.181Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:55.375Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 69, "LogEventsSize": 76.4423828125, "Time": 188}
2023-10-31T08:32:55.388Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:55.388Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:55.583Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 212, "LogEventsSize": 203.3974609375, "Time": 188}
2023-10-31T08:32:55.596Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:55.596Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:55.793Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 114, "LogEventsSize": 137.9619140625, "Time": 186}
2023-10-31T08:32:55.807Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:55.808Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:56.002Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 223, "LogEventsSize": 214.73046875, "Time": 186}
2023-10-31T08:32:56.016Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:56.019Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"http","k8s.container.name":"robo","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","k8s.pod.name":"robo-5c6df8c54d-wxbrc","k8s.pod.uid":"1428ba14-3a16-439b-9766-a78cfff30ff3","k8s.replicaset.name":"robo-5c6df8c54d","net.host.name":"192.168.129.129","net.host.port":"9090","service.instance.id":"192.168.129.129:9090","service.name":"eks-custom-service-monitoring"}}
2023-10-31T08:32:56.264Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 59, "LogEventsSize": 44.828125, "Time": 181}
2023-10-31T08:32:56.283Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"http","k8s.container.name":"robo","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","k8s.pod.name":"robo-5c6df8c54d-wxbrc","k8s.pod.uid":"1428ba14-3a16-439b-9766-a78cfff30ff3","k8s.replicaset.name":"robo-5c6df8c54d","net.host.name":"192.168.129.129","net.host.port":"9090","service.instance.id":"192.168.129.129:9090","service.name":"eks-custom-service-monitoring"}}
2023-10-31T08:32:56.284Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:56.473Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 69, "LogEventsSize": 76.95703125, "Time": 183}
2023-10-31T08:32:56.490Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:56.491Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:56.686Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 69, "LogEventsSize": 76.6884765625, "Time": 189}
2023-10-31T08:32:56.697Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:56.698Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:56.888Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 210, "LogEventsSize": 201.091796875, "Time": 183}
2023-10-31T08:32:56.905Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:56.906Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"http","k8s.container.name":"route","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","k8s.pod.name":"route-5788d4489d-rgppl","k8s.pod.uid":"39662524-0378-4d45-8713-c062f29a571f","k8s.replicaset.name":"route-5788d4489d","net.host.name":"192.168.145.184","net.host.port":"9090","service.instance.id":"192.168.145.184:9090","service.name":"eks-custom-service-monitoring"}}
2023-10-31T08:32:57.096Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 166, "LogEventsSize": 128.2802734375, "Time": 182}
2023-10-31T08:32:57.118Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"http","k8s.container.name":"route","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","k8s.pod.name":"route-5788d4489d-rgppl","k8s.pod.uid":"39662524-0378-4d45-8713-c062f29a571f","k8s.replicaset.name":"route-5788d4489d","net.host.name":"192.168.145.184","net.host.port":"9090","service.instance.id":"192.168.145.184:9090","service.name":"eks-custom-service-monitoring"}}
2023-10-31T08:32:57.125Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"ip-192-168-104-2.ap-northeast-1.compute.internal","net.host.name":"ip-192-168-104-2.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"ip-192-168-104-2.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:57.423Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 185, "LogEventsSize": 255.5693359375, "Time": 240}

\\ further logs
mhausenblas commented 10 months ago

Thanks @jatinmehrotra and from the logs I don't see anything that seems suspicious. Let me dig deeper …

jatinmehrotra commented 10 months ago

@mhausenblas

Thank you for the confirmation and awaiting to know the further steps for container insights and metrics.

Also according to this docs, do I need to customise my current implementation( like add any other iam permissions or change from the application per se) steps in order to view container insights and custom namespace metrics? 

mhausenblas commented 10 months ago

Also according to this docs, do I need to customise my current implementation( like add any other iam permissions or change from the application per se) steps in order to view container insights and custom namespace metrics?

Oh, I was under the impression you followed our docs. What steps have you not done?

jatinmehrotra commented 10 months ago

@mhausenblas

Like i mentioned in this comment https://github.com/aws-observability/aws-otel-collector/issues/2441#issue-1967965382

Precisely I have followed this guide

(Optional) Verify the metrics data is being sent to Amazon CloudWatch by opening the Amazon CloudWatch console and open the Metrics menu on the left. Select All metrics and click the AOCDockerDemo/AOCDockerDemoService box under custom namespaces. You can view any metrics data by selecting any grouping.

Oh, I was under the impression you followed our docs. What steps have you not done?

This page https://aws-otel.github.io/docs/getting-started/container-insights/eks-prometheus, i have referenced only to tell that I was expecting metrics( under custom namespace ) and container insights as shown in the pictures. I haven't followed this page https://aws-otel.github.io/docs/getting-started/container-insights/eks-prometheus

Screenshot 2023-10-31 at 6 54 34 PM

bryan-aguilar commented 10 months ago

Could you provide some examples of the CW EMF logs you are seeing in /aws/containerinsights/${CLUSTER_NAME}/prometheus?

bryan-aguilar commented 10 months ago

Can you expand more on this

Used this yaml file for collector configuration. I have modified it according to my needs. I have added eks-custom-service-monitoring seperate scraping job for collector and modified the exporter configuration little.

Specifically have modified it according to my needs. Container insights configs are very opinionated because the container insights service expects a specific set of metrics and dimensions to be available. Modifications to the configuration could break that experience. Have you tried using the config without making any modificaitons?

jatinmehrotra commented 10 months ago

@bryan-aguilar

Could you provide some examples of the CW EMF logs you are seeing in /aws/containerinsights/${CLUSTER_NAME}/prometheus?

log of application pod running in cluster scrapped by custom job

{
    "EKS_Cluster": "my-custom-eks-cluster",
    "EKS_ContainerName": "robo",
    "EKS_PodName": "robo-5c6df8c54d-wxbrc",
    "OTelLib": "otelcol/prometheusreceiver",
    "grpc_code": "Canceled",
    "grpc_method": "Get",
    "grpc_server_handled_total": 0,
    "grpc_service": "task.Service",
    "grpc_type": "unary",
    "http.scheme": "http",
    "k8s.container.name": "robo",
    "k8s.namespace.name": "my-namespace",
    "k8s.node.name": "fargate-ip-192-168-129-129.ap-northeast-1.compute.internal",
    "k8s.pod.name": "robo-5c6df8c54d-wxbrc",
    "k8s.pod.uid": "1428ba14-3a16-439b-9766-a78cfff30ff3",
    "k8s.replicaset.name": "robo-5c6df8c54d",
    "net.host.name": "192.168.129.129",
    "net.host.port": "9090",
    "service.instance.id": "192.168.129.129:9090",
    "service.name": "eks-custom-service-monitoring"
}

Log from Otel receiver

{
    "EKS_Cluster": "my-custom-eks-cluste",
    "OTelLib": "otelcol/prometheusreceiver",
    "beta_kubernetes_io_arch": "amd64",
    "beta_kubernetes_io_os": "linux",
    "container": "router",
    "container_spec_cpu_period": 100000,
    "container_spec_cpu_quota": 25000,
    "container_spec_cpu_shares": 256,
    "container_spec_memory_limit_bytes": 0,
    "container_spec_memory_reservation_limit_bytes": 0,
    "container_spec_memory_swap_limit_bytes": 0,
    "container_start_time_seconds": 1698124489,
    "eks_amazonaws_com_compute_type": "fargate",
    "failure_domain_beta_kubernetes_io_region": "ap-northeast-1",
    "failure_domain_beta_kubernetes_io_zone": "ap-northeast-1c",
    "http.scheme": "https",
    "id": "/kubepods/burstable/pod3xxxxxxxx/xxxxxxxxx",
    "image": "xxxxxxxxxxx.xxxxxxx.ecr.us-east-1.amazonaws.com/xxxxxxxxxxxxx/route:xxxxxxxxxxxx",
    "k8s.node.name": "fargate-ip-192-168-145-184.ap-northeast-1.compute.internal",
    "kubernetes_io_arch": "amd64",
    "kubernetes_io_hostname": "fargate-ip-192-168-145-184.ap-northeast-1.compute.internal",
    "kubernetes_io_os": "linux",
    "name": "xxxxxxxxxxxxxxx",
    "namespace": "my-namespace",
    "net.host.name": "fargate-ip-192-168-145-184.ap-northeast-1.compute.internal",
    "net.host.port": "",
    "pod": "route-5788d4489d-rgppl",
    "service.instance.id": "fargate-ip-192-168-145-184.ap-northeast-1.compute.internal",
    "service.name": "kubernetes-nodes-cadvisor",
    "topology_kubernetes_io_region": "ap-northeast-1",
    "topology_kubernetes_io_zone": "ap-northeast-1c"
}

log of application pod running in cluster scrapped by custom job

{
    "EKS_Cluster": "my-custom-eks-cluster",
    "EKS_ContainerName": "redis",
    "EKS_PodName": "redis-56944bf684-qvs8c",
    "OTelLib": "otelcol/prometheusreceiver",
    "db": "db1",
    "http.scheme": "http",
    "k8s.container.name": "redis-exporter",
    "k8s.namespace.name": "my-namespac",
    "k8s.node.name": "fargate-ip-192-168-112-42.ap-northeast-1.compute.internal",
    "k8s.pod.name": "redis-56944bf684-qvs8c",
    "k8s.pod.uid": "df01127c-d870-4488-9bd0-0f8a7f4e021d",
    "k8s.replicaset.name": "redis-56944bf684",
    "net.host.name": "192.168.112.42",
    "net.host.port": "9121",
    "redis_db_keys": 0,
    "redis_db_keys_expiring": 0,
    "service.instance.id": "192.168.112.42:9121",
    "service.name": "eks-custom-service-monitoring"
}

Are these example logs enough?

jatinmehrotra commented 10 months ago

@bryan-aguilar

CC: @mhausenblas

Can you expand more on this

Used this yaml file for collector configuration. I have modified it according to my needs. I have added eks-custom-service-monitoring seperate scraping job for collector and modified the exporter configuration little.

Specifically have modified it according to my needs. Container insights configs are very opinionated because the container insights service expects a specific set of metrics and dimensions to be available. Modifications to the configuration could break that experience. Have you tried using the config without making any modificaitons?

I will divide my follow-up into 2 parts:-

1. Results with default configuration

# OpenTelemetry Collector configuration
# Metrics pipeline with Prometheus Receiver and Amazon CloudWatch EMF Exporter sending metrics to Amazon CloudWatch
#
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: my-collector-cloudwatch
spec:
  mode: deployment
  serviceAccount: adot-collector-sa
  podAnnotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '8888'
  resources:
    requests:
      cpu: "1"
    limits:
      cpu: "1"
  env:
    - name: CLUSTER_NAME
      value: my-eks-cluster
  config: |
    receivers:
      #
      # Scrape configuration for the Prometheus Receiver
      # This is the same configuration used when Prometheus is installed using the community Helm chart
      #
      prometheus:
        config:
          global:
            scrape_interval: 15s
            scrape_timeout: 10s

          scrape_configs:
          - job_name: kubernetes-apiservers
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: default;kubernetes;https
              source_labels:
              - __meta_kubernetes_namespace
              - __meta_kubernetes_service_name
              - __meta_kubernetes_endpoint_port_name
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true

          - job_name: kubernetes-nodes
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$$1/proxy/metrics
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true

          - job_name: kubernetes-nodes-cadvisor
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true

          - job_name: kubernetes-service-endpoints
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
              replacement: __param_$$1
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: kubernetes_node

          - job_name: kubernetes-service-endpoints-slow
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
              replacement: __param_$$1
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: kubernetes_node
            scrape_interval: 5m
            scrape_timeout: 30s

          - job_name: prometheus-pushgateway
            kubernetes_sd_configs:
            - role: service
            relabel_configs:
            - action: keep
              regex: pushgateway
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_probe

          - job_name: kubernetes-services
            kubernetes_sd_configs:
            - role: service
            metrics_path: /probe
            params:
              module:
              - http_2xx
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_probe
            - source_labels:
              - __address__
              target_label: __param_target
            - replacement: blackbox
              target_label: __address__
            - source_labels:
              - __param_target
              target_label: instance
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name

          - job_name: kubernetes-pods
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
              replacement: __param_$$1
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: kubernetes_pod_name
            - action: drop
              regex: Pending|Succeeded|Failed|Completed
              source_labels:
              - __meta_kubernetes_pod_phase

          - job_name: kubernetes-pods-slow
            scrape_interval: 5m
            scrape_timeout: 30s
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
              replacement: __param_$1
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: pod
            - action: drop
              regex: Pending|Succeeded|Failed|Completed
              source_labels:
              - __meta_kubernetes_pod_phase

    processors:
      batch/metrics:
        timeout: 60s
        # send_batch_size: 50
      #
      # Processor to transform the names of existing labels and/or add new labels to the metrics identified
      #
      metricstransform/labelling:
        transforms:
          - include: .*
            match_type: regexp
            action: update
            operations:
              - action: add_label
                new_label: EKS_Cluster
                new_value: ${CLUSTER_NAME}
              - action: update_label
                label: kubernetes_pod_name
                new_label: EKS_PodName
              - action: update_label
                label: kubernetes_namespace
                new_label: EKS_Namespace

    exporters:
      #
      # AWS EMF exporter that sends metrics data as performance log events to Amazon CloudWatch
      # Only the metrics that were filtered out by the processors get to this stage of the pipeline
      # Under the metric_declarations field, add one or more sets of Amazon CloudWatch dimensions
      # Each dimension must alredy exist as a label on the Prometheus metric
      # For each set of dimensions, add a list of metrics under the metric_name_selectors field
      # Metrics names may be listed explicitly or using regular expressions
      # A default list of metrics has been provided
      # Data from performance log events will be aggregated by Amazon CloudWatch using these dimensions to create an Amazon CloudWatch custom metric
      #
      awsemf:
        region: us-east-1
        role_arn: arn:aws:iam::xxxxxxxxxxxxx:role/role-adot-prometheus-metric-write-cloudwatch-logs
        namespace: ContainerInsights/Prometheus
        log_group_name: '/aws/containerinsights/${CLUSTER_NAME}/prometheus'
        resource_to_telemetry_conversion:
          enabled: true
        dimension_rollup_option: NoDimensionRollup
        parse_json_encoded_attr_values: [Sources, kubernetes]
        metric_declarations:
          - dimensions: [[EKS_Cluster, EKS_Namespace, EKS_PodName]]
            metric_name_selectors:
              - apiserver_request_.*
              - container_memory_.*
              - container_threads
              - otelcol_process_.*
    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [batch/metrics,metricstransform/labelling]
          exporters: [awsemf]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-prometheus-role
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
      - nodes/proxy
      - services
      - endpoints
      - pods
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - nonResourceURLs:
      - /metrics
    verbs:
      - get

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-prometheus-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: otel-prometheus-role
subjects:
  - kind: ServiceAccount
    name: adot-collector-sa
    namespace: default

Follow up questions for the default configuration

Follow-up questions when using my configuration

Container insights configs are very opinionated because the container insights service expects a specific set of metrics and dimensions to be available. Modifications to the configuration could break that experience.

jatinmehrotra commented 10 months ago

@mhausenblas

CC: @bryan-aguilar

Is there any update to this https://github.com/aws-observability/aws-otel-collector/issues/2441#issuecomment-1790480116

AbhishPrasad commented 8 months ago

@jatinmehrotra I'm also getting same issue, any update or solution?

github-actions[bot] commented 6 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

jatinmehrotra commented 6 months ago

@AbhishPrasad I wasn't able to make this work, so its its still pending from my side too.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] commented 2 months ago

This issue was closed because it has been marked as stale for 30 days with no activity.