Why does ADOT Collector for prometheus is unable to create EKS Metrics and Container Insights in Cloudwatch?

Describe the question

I am trying to send Prometheus metrics scraped by the ADOT collector by Prometheus and send it to Cloudwatch instead of AMP.

Steps to reproduce if your question is related to an action

Precisely I have followed this guide
Used this yaml file for collector configuration. I have modified it according to my needs. I have added eks-custom-service-monitoring seperate scraping job for collector and modified the exporter configuration little.
I am using separate monitoring account to implement observability and running workloads so I am using seperate IAM roles. Collector uses adot-collector-sa role ( in workload account ) which has permission to assume iam role (n monitoring account). I have added permissions below.

What did you expect to see?

As per this docs I expected the following:
- I expected to see Cloudwatch logs with log group name /aws/containerinsights/${CLUSTER_NAME}/prometheus ✅ . This also means that my custom scraping job is working absolutely fine.
- I expected to see ContainerInsights/Prometheus in cloud watch metric as custom namespace and metrics inside it as per my collector configuration❌
- I expected to see EKS cluster Performance monitoring under container insights. ❌

Environment

my collector config

#
# OpenTelemetry Collector configuration
# Metrics pipeline with Prometheus Receiver and Amazon CloudWatch EMF Exporter sending metrics to Amazon CloudWatch
#
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: my-collector-cloudwatch
spec:
  mode: deployment
  serviceAccount: adot-collector-sa
  podAnnotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '8888'
  resources:
    requests:
      cpu: "1"
    limits:
      cpu: "1"
  env:
    - name: CLUSTER_NAME
      value: my-custom-eks-cluster
  config: |
    receivers:
      #
      # Scrape configuration for the Prometheus Receiver
      # This is the same configuration used when Prometheus is installed using the community Helm chart
      #
      prometheus:
        config:
          global:
            scrape_interval: 10s
            scrape_timeout: 10s

          scrape_configs:
          - job_name: kubernetes-apiservers
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: default;kubernetes;https
              source_labels:
              - __meta_kubernetes_namespace
              - __meta_kubernetes_service_name
              - __meta_kubernetes_endpoint_port_name
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true

          - job_name: kubernetes-nodes
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$$1/proxy/metrics
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true

          - job_name: kubernetes-nodes-cadvisor
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true

          - job_name: kubernetes-service-endpoints
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
              replacement: __param_$$1
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: kubernetes_node

          - job_name: kubernetes-service-endpoints-slow
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
              replacement: __param_$$1
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: kubernetes_node
            scrape_interval: 5m
            scrape_timeout: 30s

          - job_name: prometheus-pushgateway
            kubernetes_sd_configs:
            - role: service
            relabel_configs:
            - action: keep
              regex: pushgateway
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_probe

          - job_name: kubernetes-services
            kubernetes_sd_configs:
            - role: service
            metrics_path: /probe
            params:
              module:
              - http_2xx
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_probe
            - source_labels:
              - __address__
              target_label: __param_target
            - replacement: blackbox
              target_label: __address__
            - source_labels:
              - __param_target
              target_label: instance
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name

          - job_name: eks-custom-service-monitoring
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: my-namespace;(9090|9121)
              source_labels:
              - __meta_kubernetes_namespace
              - __meta_kubernetes_pod_container_port_number
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_container_name
              target_label: container_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: pod_name

    processors:
      batch/metrics:
        timeout: 1s
        send_batch_size: 50
      #
      # Processor to transform the names of existing labels and/or add new labels to the metrics identified
      #
      metricstransform/labelling:
        transforms:
          - include: .*
            match_type: regexp
            action: update
            operations:
              - action: add_label
                new_label: EKS_Cluster
                new_value: ${CLUSTER_NAME}
              - action: update_label
                label: pod_name
                new_label: EKS_PodName
              - action: update_label
                label: container_name
                new_label: EKS_ContainerName
    exporters:
      #
      # AWS EMF exporter that sends metrics data as performance log events to Amazon CloudWatch
      # Only the metrics that were filtered out by the processors get to this stage of the pipeline
      # Under the metric_declarations field, add one or more sets of Amazon CloudWatch dimensions
      # Each dimension must alredy exist as a label on the Prometheus metric
      # For each set of dimensions, add a list of metrics under the metric_name_selectors field
      # Metrics names may be listed explicitly or using regular expressions
      # A default list of metrics has been provided
      # Data from performance log events will be aggregated by Amazon CloudWatch using these dimensions to create an Amazon CloudWatch custom metric
      #
      awsemf:
        region: us-east-1
        role_arn: arn:aws:iam::xxxxxxxxxx:role/eks-role-adot-prometheus-metric-write-cloudwatch-logs
        namespace: ContainerInsights/Prometheus
        log_group_name: '/aws/containerinsights/${CLUSTER_NAME}/prometheus'
        resource_to_telemetry_conversion:
          enabled: true
        dimension_rollup_option: "NoDimensionRollup"
        parse_json_encoded_attr_values: [Sources, kubernetes]
        metric_declarations:
          - dimensions: [[EKS_Cluster, EKS_Namespace, EKS_PodName]]
            metric_name_selectors:
              - apiserver_request_.*
              - container_memory_.*
              - container_threads
              - otelcol_process_.*
          - dimensions: [[]]
            metric_name_selectors:
              - __meta_kubernetes_namespace
          - dimensions: [[]]
            metric_name_selectors:
              - __meta_kubernetes_pod_container_port_number
    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [batch/metrics,metricstransform/labelling]
          exporters: [awsemf]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-prometheus-role
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
      - nodes/proxy
      - services
      - endpoints
      - pods
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - nonResourceURLs:
      - /metrics
    verbs:
      - get

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-prometheus-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: otel-prometheus-role
subjects:
  - kind: ServiceAccount
    name: adot-collector-sa
    namespace: default

adot-collector-sa iam role permissions. OICD trust relationship is there and already setup not adding ue to brevty

{
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Resource": [
                "arn:aws:iam::xxxxxxxxx:role/eks-role-xray-remote-write-adot",
                "arn:aws:iam::xxxxxxxx:role/eks-role-amp-remote-write-adot",
                "arn:aws:iam::xxxxxxxxxx:role/eks-role-adot-prometheus-metric-write-cloudwatch-logs"
            ],
            "Sid": "assumeRoleToAmpRemoteWriteAdot"
        }
    ],
    "Version": "2012-10-17"
}

eks-role-adot-prometheus-metric-write-cloudwatch-logs Permissions

As per docs I have added Managed AWS Policy of CloudWatchAgentServerPolicy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:PutMetricData",
                "ec2:DescribeVolumes",
                "ec2:DescribeTags",
                "logs:PutLogEvents",
                "logs:DescribeLogStreams",
                "logs:DescribeLogGroups",
                "logs:CreateLogStream",
                "logs:CreateLogGroup"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:GetParameter"
            ],
            "Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*"
        }
    ]
}

Trust relationship for the above role to assume it

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "allowRolesInWorkloadAccountsToAssumeRole",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Additional context

This configuration works for ADOT collector with AMP so my custom job for scraping and iam role permissions are not incorrect IMO.

Thanks a lot for the details @jatinmehrotra and can you please share the collector logs, from what you shared it should work but without the collector logs it's hard to tell if the collection or ingestion part is not working.

@mhausenblas

Thanks a lot for the details @jatinmehrotra and can you please share the collector logs, from what you shared it should work but without the collector logs it's hard to tell if the collection or ingestion part is not working.

By collector logs do you mean Collector pod logs or cloudwatch logs generated by the collector in the log stream?

The collector pod logs

@mhausenblas

Here are the collector pod logs

2023/10/31 08:32:46 ADOT Collector version: v0.32.0
2023/10/31 08:32:46 found no extra config, skip it, err: open /opt/aws/aws-otel-collector/etc/extracfg.txt: no such file or directory
2023/10/31 08:32:46 attn: users of the statsd receiver please refer to https://github.com/aws-observability/aws-otel-collector/issues/2249 in regards to an ADOT Collector v0.33.0 breaking change
2023-10-31T08:32:46.595Z    info    service/telemetry.go:84 Setting up own telemetry...
2023-10-31T08:32:46.596Z    info    service/telemetry.go:201    Serving Prometheus metrics  {"address": ":8888", "level": "Basic"}
2023-10-31T08:32:46.597Z    info    awsutil@v0.82.0/conn.go:256 STS Endpoint    {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "endpoint": "https://sts.us-east-1.amazonaws.com"}
2023-10-31T08:32:47.396Z    info    service/service.go:132  Starting aws-otel-collector...  {"Version": "v0.32.0", "NumCPU": 2}
2023-10-31T08:32:47.396Z    info    extensions/extensions.go:30 Starting extensions...
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-apiservers"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:239  Starting discovery manager  {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-nodes"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-nodes-cadvisor"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-service-endpoints"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-service-endpoints-slow"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "prometheus-pushgateway"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-services"}
2023-10-31T08:32:47.396Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:230  Scrape job added    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "eks-custom-service-monitoring"}
2023-10-31T08:32:47.396Z    info    kubernetes/kubernetes.go:326    Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "kubernetes", "config": "kubernetes-services"}
2023-10-31T08:32:47.396Z    info    kubernetes/kubernetes.go:326    Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "kubernetes", "config": "eks-custom-service-monitoring"}
2023-10-31T08:32:47.396Z    info    kubernetes/kubernetes.go:326    Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "kubernetes", "config": "kubernetes-apiservers"}
2023-10-31T08:32:47.397Z    info    kubernetes/kubernetes.go:326    Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "kubernetes", "config": "kubernetes-nodes"}
2023-10-31T08:32:47.397Z    info    service/service.go:149  Everything is ready. Begin running and processing data.
2023-10-31T08:32:47.397Z    info    prometheusreceiver@v0.82.0/metrics_receiver.go:278  Starting scrape manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2023-10-31T08:32:52.603Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.container.name":"frontend","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","k8s.pod.name":"frontend-545dcdccbc-zsxcm","k8s.pod.uid":"a55aaaff-73f3-423d-aa17-26236ef39511","k8s.replicaset.name":"frontend-545dcdccbc","net.host.name":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:54.235Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 225, "LogEventsSize": 216.4423828125, "Time": 868}
2023-10-31T08:32:54.235Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.container.name":"frontend","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","k8s.pod.name":"frontend-545dcdccbc-zsxcm","k8s.pod.uid":"a55aaaff-73f3-423d-aa17-26236ef39511","k8s.replicaset.name":"frontend-545dcdccbc","net.host.name":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:54.238Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.namespace.name":"default","net.host.name":"192.168.140.82","net.host.port":"443","service.instance.id":"192.168.140.82:443","service.name":"kubernetes-apiservers"}}
2023-10-31T08:32:54.678Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 539, "LogEventsSize": 255.9189453125, "Time": 416}
2023-10-31T08:32:54.980Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 545, "LogEventsSize": 255.7861328125, "Time": 295}
2023-10-31T08:32:55.159Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 9, "LogEventsSize": 3.953125, "Time": 179}
2023-10-31T08:32:55.181Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.namespace.name":"default","net.host.name":"192.168.140.82","net.host.port":"443","service.instance.id":"192.168.140.82:443","service.name":"kubernetes-apiservers"}}
2023-10-31T08:32:55.181Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:55.375Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 69, "LogEventsSize": 76.4423828125, "Time": 188}
2023-10-31T08:32:55.388Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:55.388Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:55.583Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 212, "LogEventsSize": 203.3974609375, "Time": 188}
2023-10-31T08:32:55.596Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:55.596Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:55.793Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 114, "LogEventsSize": 137.9619140625, "Time": 186}
2023-10-31T08:32:55.807Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:55.808Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:56.002Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 223, "LogEventsSize": 214.73046875, "Time": 186}
2023-10-31T08:32:56.016Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:56.019Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"http","k8s.container.name":"robo","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","k8s.pod.name":"robo-5c6df8c54d-wxbrc","k8s.pod.uid":"1428ba14-3a16-439b-9766-a78cfff30ff3","k8s.replicaset.name":"robo-5c6df8c54d","net.host.name":"192.168.129.129","net.host.port":"9090","service.instance.id":"192.168.129.129:9090","service.name":"eks-custom-service-monitoring"}}
2023-10-31T08:32:56.264Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 59, "LogEventsSize": 44.828125, "Time": 181}
2023-10-31T08:32:56.283Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"http","k8s.container.name":"robo","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","k8s.pod.name":"robo-5c6df8c54d-wxbrc","k8s.pod.uid":"1428ba14-3a16-439b-9766-a78cfff30ff3","k8s.replicaset.name":"robo-5c6df8c54d","net.host.name":"192.168.129.129","net.host.port":"9090","service.instance.id":"192.168.129.129:9090","service.name":"eks-custom-service-monitoring"}}
2023-10-31T08:32:56.284Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:56.473Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 69, "LogEventsSize": 76.95703125, "Time": 183}
2023-10-31T08:32:56.490Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:56.491Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:56.686Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 69, "LogEventsSize": 76.6884765625, "Time": 189}
2023-10-31T08:32:56.697Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:56.698Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:56.888Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 210, "LogEventsSize": 201.091796875, "Time": 183}
2023-10-31T08:32:56.905Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:56.906Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"http","k8s.container.name":"route","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","k8s.pod.name":"route-5788d4489d-rgppl","k8s.pod.uid":"39662524-0378-4d45-8713-c062f29a571f","k8s.replicaset.name":"route-5788d4489d","net.host.name":"192.168.145.184","net.host.port":"9090","service.instance.id":"192.168.145.184:9090","service.name":"eks-custom-service-monitoring"}}
2023-10-31T08:32:57.096Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 166, "LogEventsSize": 128.2802734375, "Time": 182}
2023-10-31T08:32:57.118Z    info    awsemfexporter@v0.82.0/emf_exporter.go:143  Finish processing resource metrics  {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"http","k8s.container.name":"route","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","k8s.pod.name":"route-5788d4489d-rgppl","k8s.pod.uid":"39662524-0378-4d45-8713-c062f29a571f","k8s.replicaset.name":"route-5788d4489d","net.host.name":"192.168.145.184","net.host.port":"9090","service.instance.id":"192.168.145.184:9090","service.name":"eks-custom-service-monitoring"}}
2023-10-31T08:32:57.125Z    info    awsemfexporter@v0.82.0/emf_exporter.go:90   Start processing resource metrics   {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"ip-192-168-104-2.ap-northeast-1.compute.internal","net.host.name":"ip-192-168-104-2.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"ip-192-168-104-2.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:57.423Z    info    cwlogs@v0.82.0/pusher.go:294    logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 185, "LogEventsSize": 255.5693359375, "Time": 240}

\\ further logs

Thanks @jatinmehrotra and from the logs I don't see anything that seems suspicious. Let me dig deeper …

@mhausenblas

Thank you for the confirmation and awaiting to know the further steps for container insights and metrics.

Also according to this docs, do I need to customise my current implementation( like add any other iam permissions or change from the application per se) steps in order to view container insights and custom namespace metrics?　

Also according to this docs, do I need to customise my current implementation( like add any other iam permissions or change from the application per se) steps in order to view container insights and custom namespace metrics?

Oh, I was under the impression you followed our docs. What steps have you not done?

@mhausenblas

Like i mentioned in this comment https://github.com/aws-observability/aws-otel-collector/issues/2441#issue-1967965382

Precisely I have followed this guide

Even in this guide it is mentioned that you can view metrics under custom namespace ( ofcourse the namespace mentioned in github file and in the docs is different ). I have followed the above-mentioned guide word for word.
So far I have implemented ADOT + XRAY( with sample app for traces) , ADOT + AMP + Grafana following the same guide using my current custom scraping job and it worked there was no problem so far.

(Optional) Verify the metrics data is being sent to Amazon CloudWatch by opening the Amazon CloudWatch console and open the Metrics menu on the left. Select All metrics and click the AOCDockerDemo/AOCDockerDemoService box under custom namespaces. You can view any metrics data by selecting any grouping.

Oh, I was under the impression you followed our docs. What steps have you not done?

This page https://aws-otel.github.io/docs/getting-started/container-insights/eks-prometheus, i have referenced only to tell that I was expecting metrics( under custom namespace ) and container insights as shown in the pictures. I haven't followed this page https://aws-otel.github.io/docs/getting-started/container-insights/eks-prometheus

Screenshot 2023-10-31 at 6 54 34 PM

Could you provide some examples of the CW EMF logs you are seeing in /aws/containerinsights/${CLUSTER_NAME}/prometheus?

Can you expand more on this

Used this yaml file for collector configuration. I have modified it according to my needs. I have added eks-custom-service-monitoring seperate scraping job for collector and modified the exporter configuration little.

Specifically have modified it according to my needs. Container insights configs are very opinionated because the container insights service expects a specific set of metrics and dimensions to be available. Modifications to the configuration could break that experience. Have you tried using the config without making any modificaitons?

@bryan-aguilar

Could you provide some examples of the CW EMF logs you are seeing in /aws/containerinsights/${CLUSTER_NAME}/prometheus?

log of application pod running in cluster scrapped by custom job

{
    "EKS_Cluster": "my-custom-eks-cluster",
    "EKS_ContainerName": "robo",
    "EKS_PodName": "robo-5c6df8c54d-wxbrc",
    "OTelLib": "otelcol/prometheusreceiver",
    "grpc_code": "Canceled",
    "grpc_method": "Get",
    "grpc_server_handled_total": 0,
    "grpc_service": "task.Service",
    "grpc_type": "unary",
    "http.scheme": "http",
    "k8s.container.name": "robo",
    "k8s.namespace.name": "my-namespace",
    "k8s.node.name": "fargate-ip-192-168-129-129.ap-northeast-1.compute.internal",
    "k8s.pod.name": "robo-5c6df8c54d-wxbrc",
    "k8s.pod.uid": "1428ba14-3a16-439b-9766-a78cfff30ff3",
    "k8s.replicaset.name": "robo-5c6df8c54d",
    "net.host.name": "192.168.129.129",
    "net.host.port": "9090",
    "service.instance.id": "192.168.129.129:9090",
    "service.name": "eks-custom-service-monitoring"
}

Log from Otel receiver

{
    "EKS_Cluster": "my-custom-eks-cluste",
    "OTelLib": "otelcol/prometheusreceiver",
    "beta_kubernetes_io_arch": "amd64",
    "beta_kubernetes_io_os": "linux",
    "container": "router",
    "container_spec_cpu_period": 100000,
    "container_spec_cpu_quota": 25000,
    "container_spec_cpu_shares": 256,
    "container_spec_memory_limit_bytes": 0,
    "container_spec_memory_reservation_limit_bytes": 0,
    "container_spec_memory_swap_limit_bytes": 0,
    "container_start_time_seconds": 1698124489,
    "eks_amazonaws_com_compute_type": "fargate",
    "failure_domain_beta_kubernetes_io_region": "ap-northeast-1",
    "failure_domain_beta_kubernetes_io_zone": "ap-northeast-1c",
    "http.scheme": "https",
    "id": "/kubepods/burstable/pod3xxxxxxxx/xxxxxxxxx",
    "image": "xxxxxxxxxxx.xxxxxxx.ecr.us-east-1.amazonaws.com/xxxxxxxxxxxxx/route:xxxxxxxxxxxx",
    "k8s.node.name": "fargate-ip-192-168-145-184.ap-northeast-1.compute.internal",
    "kubernetes_io_arch": "amd64",
    "kubernetes_io_hostname": "fargate-ip-192-168-145-184.ap-northeast-1.compute.internal",
    "kubernetes_io_os": "linux",
    "name": "xxxxxxxxxxxxxxx",
    "namespace": "my-namespace",
    "net.host.name": "fargate-ip-192-168-145-184.ap-northeast-1.compute.internal",
    "net.host.port": "",
    "pod": "route-5788d4489d-rgppl",
    "service.instance.id": "fargate-ip-192-168-145-184.ap-northeast-1.compute.internal",
    "service.name": "kubernetes-nodes-cadvisor",
    "topology_kubernetes_io_region": "ap-northeast-1",
    "topology_kubernetes_io_zone": "ap-northeast-1c"
}

log of application pod running in cluster scrapped by custom job

{
    "EKS_Cluster": "my-custom-eks-cluster",
    "EKS_ContainerName": "redis",
    "EKS_PodName": "redis-56944bf684-qvs8c",
    "OTelLib": "otelcol/prometheusreceiver",
    "db": "db1",
    "http.scheme": "http",
    "k8s.container.name": "redis-exporter",
    "k8s.namespace.name": "my-namespac",
    "k8s.node.name": "fargate-ip-192-168-112-42.ap-northeast-1.compute.internal",
    "k8s.pod.name": "redis-56944bf684-qvs8c",
    "k8s.pod.uid": "df01127c-d870-4488-9bd0-0f8a7f4e021d",
    "k8s.replicaset.name": "redis-56944bf684",
    "net.host.name": "192.168.112.42",
    "net.host.port": "9121",
    "redis_db_keys": 0,
    "redis_db_keys_expiring": 0,
    "service.instance.id": "192.168.112.42:9121",
    "service.name": "eks-custom-service-monitoring"
}

Are these example logs enough?

@bryan-aguilar

CC: @mhausenblas

Can you expand more on this

Used this yaml file for collector configuration. I have modified it according to my needs. I have added eks-custom-service-monitoring seperate scraping job for collector and modified the exporter configuration little.

Specifically have modified it according to my needs. Container insights configs are very opinionated because the container insights service expects a specific set of metrics and dimensions to be available. Modifications to the configuration could break that experience. Have you tried using the config without making any modificaitons?

I will divide my follow-up into 2 parts:-

Results with default configuration
Modifications to my configuration

1. Results with default configuration

When I have tried the default configuration. I was able to confirm the following:-
- Cloudwatch logs of the collector ✅
- CustomNameSpace and metrics inside it ✅
  - I am able to see metrics for the collector pod only ( which was the expected behaviour) ✅
- Container insights ❌
  - CloudWatch -> Container Insights -> Performance Monitoring -> EKS Namespace ( No selection for clusters/namespace ) ❌
- Default configuration example

# OpenTelemetry Collector configuration
# Metrics pipeline with Prometheus Receiver and Amazon CloudWatch EMF Exporter sending metrics to Amazon CloudWatch
#
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: my-collector-cloudwatch
spec:
  mode: deployment
  serviceAccount: adot-collector-sa
  podAnnotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '8888'
  resources:
    requests:
      cpu: "1"
    limits:
      cpu: "1"
  env:
    - name: CLUSTER_NAME
      value: my-eks-cluster
  config: |
    receivers:
      #
      # Scrape configuration for the Prometheus Receiver
      # This is the same configuration used when Prometheus is installed using the community Helm chart
      #
      prometheus:
        config:
          global:
            scrape_interval: 15s
            scrape_timeout: 10s

          scrape_configs:
          - job_name: kubernetes-apiservers
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: default;kubernetes;https
              source_labels:
              - __meta_kubernetes_namespace
              - __meta_kubernetes_service_name
              - __meta_kubernetes_endpoint_port_name
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true

          - job_name: kubernetes-nodes
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$$1/proxy/metrics
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true

          - job_name: kubernetes-nodes-cadvisor
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true

          - job_name: kubernetes-service-endpoints
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
              replacement: __param_$$1
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: kubernetes_node

          - job_name: kubernetes-service-endpoints-slow
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
              replacement: __param_$$1
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: kubernetes_node
            scrape_interval: 5m
            scrape_timeout: 30s

          - job_name: prometheus-pushgateway
            kubernetes_sd_configs:
            - role: service
            relabel_configs:
            - action: keep
              regex: pushgateway
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_probe

          - job_name: kubernetes-services
            kubernetes_sd_configs:
            - role: service
            metrics_path: /probe
            params:
              module:
              - http_2xx
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_probe
            - source_labels:
              - __address__
              target_label: __param_target
            - replacement: blackbox
              target_label: __address__
            - source_labels:
              - __param_target
              target_label: instance
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - source_labels:
              - __meta_kubernetes_service_name
              target_label: kubernetes_name

          - job_name: kubernetes-pods
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
              replacement: __param_$$1
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: kubernetes_pod_name
            - action: drop
              regex: Pending|Succeeded|Failed|Completed
              source_labels:
              - __meta_kubernetes_pod_phase

          - job_name: kubernetes-pods-slow
            scrape_interval: 5m
            scrape_timeout: 30s
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
              replacement: __param_$1
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: pod
            - action: drop
              regex: Pending|Succeeded|Failed|Completed
              source_labels:
              - __meta_kubernetes_pod_phase

    processors:
      batch/metrics:
        timeout: 60s
        # send_batch_size: 50
      #
      # Processor to transform the names of existing labels and/or add new labels to the metrics identified
      #
      metricstransform/labelling:
        transforms:
          - include: .*
            match_type: regexp
            action: update
            operations:
              - action: add_label
                new_label: EKS_Cluster
                new_value: ${CLUSTER_NAME}
              - action: update_label
                label: kubernetes_pod_name
                new_label: EKS_PodName
              - action: update_label
                label: kubernetes_namespace
                new_label: EKS_Namespace

    exporters:
      #
      # AWS EMF exporter that sends metrics data as performance log events to Amazon CloudWatch
      # Only the metrics that were filtered out by the processors get to this stage of the pipeline
      # Under the metric_declarations field, add one or more sets of Amazon CloudWatch dimensions
      # Each dimension must alredy exist as a label on the Prometheus metric
      # For each set of dimensions, add a list of metrics under the metric_name_selectors field
      # Metrics names may be listed explicitly or using regular expressions
      # A default list of metrics has been provided
      # Data from performance log events will be aggregated by Amazon CloudWatch using these dimensions to create an Amazon CloudWatch custom metric
      #
      awsemf:
        region: us-east-1
        role_arn: arn:aws:iam::xxxxxxxxxxxxx:role/role-adot-prometheus-metric-write-cloudwatch-logs
        namespace: ContainerInsights/Prometheus
        log_group_name: '/aws/containerinsights/${CLUSTER_NAME}/prometheus'
        resource_to_telemetry_conversion:
          enabled: true
        dimension_rollup_option: NoDimensionRollup
        parse_json_encoded_attr_values: [Sources, kubernetes]
        metric_declarations:
          - dimensions: [[EKS_Cluster, EKS_Namespace, EKS_PodName]]
            metric_name_selectors:
              - apiserver_request_.*
              - container_memory_.*
              - container_threads
              - otelcol_process_.*
    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [batch/metrics,metricstransform/labelling]
          exporters: [awsemf]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-prometheus-role
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
      - nodes/proxy
      - services
      - endpoints
      - pods
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - nonResourceURLs:
      - /metrics
    verbs:
      - get

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-prometheus-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: otel-prometheus-role
subjects:
  - kind: ServiceAccount
    name: adot-collector-sa
    namespace: default

Follow up questions for the default configuration

Since I am able to see the collector pod metrics I was hoping to see container insights at least for the collector pod metrics, Is there any other settings which is needed to configure collector insights even in the case of default configuration?

2. Modification I have done to my custom configuraiton

Why I did modifications?
My goal is to scrape metrics fro application pods running in my cluster in the my-custom-namespace and pods with port 9090|9121
Also I needed to scrape metrics

Which parts I modified from the default configuration?
I have removed lines 217-303 and added my custom job to scrape pod metrics
my custom job to scrape metrics
job_name: mu-custom-service-monitoring kubernetes_sd_configs:
- role: pod relabel_configs:
- action: keep regex:my-custom-namespace;(9090|9121) source_labels:
  - __meta_kubernetes_namespace
  - __meta_kubernetes_pod_container_port_number
- action: replace source_labels:
  - __meta_kubernetes_pod_container_name target_label: container_name
- action: replace source_labels:
  - __meta_kubernetes_pod_name target_label: pod_name

I am able to see the ADOT collector logs for application pod metrics ✅
- No custom namespace/metrics ❌
- No Container insights
Till this point it is the same configuration( removed 217-303 lines, added my custom scraping job) I have used ADOT collector with AMP + AMG and it worked I have verified that.

Note: application pods run in custom namespace and adot collector is running in default namespace which worked for me with ADOT + AMP + AMG

Follow-up questions when using my configuration

Container insights configs are very opinionated because the container insights service expects a specific set of metrics and dimensions to be available. Modifications to the configuration could break that experience.

What are those specific metrics and dimensions in related to @bryan-aguilar comment ?
How to ensure that I am able to add my custom configuration for scraping application pod metrics, visualise them in container insights in Cloud Watch, see metrics for those application pods is there a documentation or any pointer you can provide?
- The whole point of using custom configuration is to scrape metrics from application pod and visualise them using container insights, set up alarms on metrics as we are considering cost analysis between using AMP, AMG v/s Cloudwatch

@mhausenblas

CC: @bryan-aguilar

Is there any update to this https://github.com/aws-observability/aws-otel-collector/issues/2441#issuecomment-1790480116

@jatinmehrotra I'm also getting same issue, any update or solution?

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@AbhishPrasad I wasn't able to make this work, so its its still pending from my side too.

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

This issue was closed because it has been marked as stale for 30 days with no activity.

aws-observability / aws-otel-collector