Closed jatinmehrotra closed 2 months ago
Thanks a lot for the details @jatinmehrotra and can you please share the collector logs, from what you shared it should work but without the collector logs it's hard to tell if the collection or ingestion part is not working.
@mhausenblas
Thanks a lot for the details @jatinmehrotra and can you please share the collector logs, from what you shared it should work but without the collector logs it's hard to tell if the collection or ingestion part is not working.
By collector logs do you mean Collector pod logs or cloudwatch logs generated by the collector in the log stream?
The collector pod logs
@mhausenblas
Here are the collector pod logs
2023/10/31 08:32:46 ADOT Collector version: v0.32.0
2023/10/31 08:32:46 found no extra config, skip it, err: open /opt/aws/aws-otel-collector/etc/extracfg.txt: no such file or directory
2023/10/31 08:32:46 attn: users of the statsd receiver please refer to https://github.com/aws-observability/aws-otel-collector/issues/2249 in regards to an ADOT Collector v0.33.0 breaking change
2023-10-31T08:32:46.595Z info service/telemetry.go:84 Setting up own telemetry...
2023-10-31T08:32:46.596Z info service/telemetry.go:201 Serving Prometheus metrics {"address": ":8888", "level": "Basic"}
2023-10-31T08:32:46.597Z info awsutil@v0.82.0/conn.go:256 STS Endpoint {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "endpoint": "https://sts.us-east-1.amazonaws.com"}
2023-10-31T08:32:47.396Z info service/service.go:132 Starting aws-otel-collector... {"Version": "v0.32.0", "NumCPU": 2}
2023-10-31T08:32:47.396Z info extensions/extensions.go:30 Starting extensions...
2023-10-31T08:32:47.396Z info prometheusreceiver@v0.82.0/metrics_receiver.go:230 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-apiservers"}
2023-10-31T08:32:47.396Z info prometheusreceiver@v0.82.0/metrics_receiver.go:239 Starting discovery manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2023-10-31T08:32:47.396Z info prometheusreceiver@v0.82.0/metrics_receiver.go:230 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-nodes"}
2023-10-31T08:32:47.396Z info prometheusreceiver@v0.82.0/metrics_receiver.go:230 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-nodes-cadvisor"}
2023-10-31T08:32:47.396Z info prometheusreceiver@v0.82.0/metrics_receiver.go:230 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-service-endpoints"}
2023-10-31T08:32:47.396Z info prometheusreceiver@v0.82.0/metrics_receiver.go:230 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-service-endpoints-slow"}
2023-10-31T08:32:47.396Z info prometheusreceiver@v0.82.0/metrics_receiver.go:230 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "prometheus-pushgateway"}
2023-10-31T08:32:47.396Z info prometheusreceiver@v0.82.0/metrics_receiver.go:230 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "kubernetes-services"}
2023-10-31T08:32:47.396Z info prometheusreceiver@v0.82.0/metrics_receiver.go:230 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "eks-custom-service-monitoring"}
2023-10-31T08:32:47.396Z info kubernetes/kubernetes.go:326 Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "kubernetes", "config": "kubernetes-services"}
2023-10-31T08:32:47.396Z info kubernetes/kubernetes.go:326 Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "kubernetes", "config": "eks-custom-service-monitoring"}
2023-10-31T08:32:47.396Z info kubernetes/kubernetes.go:326 Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "kubernetes", "config": "kubernetes-apiservers"}
2023-10-31T08:32:47.397Z info kubernetes/kubernetes.go:326 Using pod service account via in-cluster config {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "kubernetes", "config": "kubernetes-nodes"}
2023-10-31T08:32:47.397Z info service/service.go:149 Everything is ready. Begin running and processing data.
2023-10-31T08:32:47.397Z info prometheusreceiver@v0.82.0/metrics_receiver.go:278 Starting scrape manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2023-10-31T08:32:52.603Z info awsemfexporter@v0.82.0/emf_exporter.go:90 Start processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.container.name":"frontend","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","k8s.pod.name":"frontend-545dcdccbc-zsxcm","k8s.pod.uid":"a55aaaff-73f3-423d-aa17-26236ef39511","k8s.replicaset.name":"frontend-545dcdccbc","net.host.name":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:54.235Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 225, "LogEventsSize": 216.4423828125, "Time": 868}
2023-10-31T08:32:54.235Z info awsemfexporter@v0.82.0/emf_exporter.go:143 Finish processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.container.name":"frontend","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","k8s.pod.name":"frontend-545dcdccbc-zsxcm","k8s.pod.uid":"a55aaaff-73f3-423d-aa17-26236ef39511","k8s.replicaset.name":"frontend-545dcdccbc","net.host.name":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-173-214.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:54.238Z info awsemfexporter@v0.82.0/emf_exporter.go:90 Start processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.namespace.name":"default","net.host.name":"192.168.140.82","net.host.port":"443","service.instance.id":"192.168.140.82:443","service.name":"kubernetes-apiservers"}}
2023-10-31T08:32:54.678Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 539, "LogEventsSize": 255.9189453125, "Time": 416}
2023-10-31T08:32:54.980Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 545, "LogEventsSize": 255.7861328125, "Time": 295}
2023-10-31T08:32:55.159Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 9, "LogEventsSize": 3.953125, "Time": 179}
2023-10-31T08:32:55.181Z info awsemfexporter@v0.82.0/emf_exporter.go:143 Finish processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.namespace.name":"default","net.host.name":"192.168.140.82","net.host.port":"443","service.instance.id":"192.168.140.82:443","service.name":"kubernetes-apiservers"}}
2023-10-31T08:32:55.181Z info awsemfexporter@v0.82.0/emf_exporter.go:90 Start processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:55.375Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 69, "LogEventsSize": 76.4423828125, "Time": 188}
2023-10-31T08:32:55.388Z info awsemfexporter@v0.82.0/emf_exporter.go:143 Finish processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:55.388Z info awsemfexporter@v0.82.0/emf_exporter.go:90 Start processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:55.583Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 212, "LogEventsSize": 203.3974609375, "Time": 188}
2023-10-31T08:32:55.596Z info awsemfexporter@v0.82.0/emf_exporter.go:143 Finish processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:55.596Z info awsemfexporter@v0.82.0/emf_exporter.go:90 Start processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:55.793Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 114, "LogEventsSize": 137.9619140625, "Time": 186}
2023-10-31T08:32:55.807Z info awsemfexporter@v0.82.0/emf_exporter.go:143 Finish processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:55.808Z info awsemfexporter@v0.82.0/emf_exporter.go:90 Start processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:56.002Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 223, "LogEventsSize": 214.73046875, "Time": 186}
2023-10-31T08:32:56.016Z info awsemfexporter@v0.82.0/emf_exporter.go:143 Finish processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-130-227.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:56.019Z info awsemfexporter@v0.82.0/emf_exporter.go:90 Start processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"http","k8s.container.name":"robo","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","k8s.pod.name":"robo-5c6df8c54d-wxbrc","k8s.pod.uid":"1428ba14-3a16-439b-9766-a78cfff30ff3","k8s.replicaset.name":"robo-5c6df8c54d","net.host.name":"192.168.129.129","net.host.port":"9090","service.instance.id":"192.168.129.129:9090","service.name":"eks-custom-service-monitoring"}}
2023-10-31T08:32:56.264Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 59, "LogEventsSize": 44.828125, "Time": 181}
2023-10-31T08:32:56.283Z info awsemfexporter@v0.82.0/emf_exporter.go:143 Finish processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"http","k8s.container.name":"robo","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-129-129.ap-northeast-1.compute.internal","k8s.pod.name":"robo-5c6df8c54d-wxbrc","k8s.pod.uid":"1428ba14-3a16-439b-9766-a78cfff30ff3","k8s.replicaset.name":"robo-5c6df8c54d","net.host.name":"192.168.129.129","net.host.port":"9090","service.instance.id":"192.168.129.129:9090","service.name":"eks-custom-service-monitoring"}}
2023-10-31T08:32:56.284Z info awsemfexporter@v0.82.0/emf_exporter.go:90 Start processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:56.473Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 69, "LogEventsSize": 76.95703125, "Time": 183}
2023-10-31T08:32:56.490Z info awsemfexporter@v0.82.0/emf_exporter.go:143 Finish processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-157-203.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:56.491Z info awsemfexporter@v0.82.0/emf_exporter.go:90 Start processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:56.686Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 69, "LogEventsSize": 76.6884765625, "Time": 189}
2023-10-31T08:32:56.697Z info awsemfexporter@v0.82.0/emf_exporter.go:143 Finish processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-166-75.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes-cadvisor"}}
2023-10-31T08:32:56.698Z info awsemfexporter@v0.82.0/emf_exporter.go:90 Start processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:56.888Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 210, "LogEventsSize": 201.091796875, "Time": 183}
2023-10-31T08:32:56.905Z info awsemfexporter@v0.82.0/emf_exporter.go:143 Finish processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","net.host.name":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"fargate-ip-192-168-143-103.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:56.906Z info awsemfexporter@v0.82.0/emf_exporter.go:90 Start processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"http","k8s.container.name":"route","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","k8s.pod.name":"route-5788d4489d-rgppl","k8s.pod.uid":"39662524-0378-4d45-8713-c062f29a571f","k8s.replicaset.name":"route-5788d4489d","net.host.name":"192.168.145.184","net.host.port":"9090","service.instance.id":"192.168.145.184:9090","service.name":"eks-custom-service-monitoring"}}
2023-10-31T08:32:57.096Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 166, "LogEventsSize": 128.2802734375, "Time": 182}
2023-10-31T08:32:57.118Z info awsemfexporter@v0.82.0/emf_exporter.go:143 Finish processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"http","k8s.container.name":"route","k8s.namespace.name":"eks-custom","k8s.node.name":"fargate-ip-192-168-145-184.ap-northeast-1.compute.internal","k8s.pod.name":"route-5788d4489d-rgppl","k8s.pod.uid":"39662524-0378-4d45-8713-c062f29a571f","k8s.replicaset.name":"route-5788d4489d","net.host.name":"192.168.145.184","net.host.port":"9090","service.instance.id":"192.168.145.184:9090","service.name":"eks-custom-service-monitoring"}}
2023-10-31T08:32:57.125Z info awsemfexporter@v0.82.0/emf_exporter.go:90 Start processing resource metrics {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "labels": {"http.scheme":"https","k8s.node.name":"ip-192-168-104-2.ap-northeast-1.compute.internal","net.host.name":"ip-192-168-104-2.ap-northeast-1.compute.internal","net.host.port":"","service.instance.id":"ip-192-168-104-2.ap-northeast-1.compute.internal","service.name":"kubernetes-nodes"}}
2023-10-31T08:32:57.423Z info cwlogs@v0.82.0/pusher.go:294 logpusher: publish log events successfully. {"kind": "exporter", "data_type": "metrics", "name": "awsemf", "NumOfLogEvents": 185, "LogEventsSize": 255.5693359375, "Time": 240}
\\ further logs
Thanks @jatinmehrotra and from the logs I don't see anything that seems suspicious. Let me dig deeper …
@mhausenblas
Thank you for the confirmation and awaiting to know the further steps for container insights and metrics.
Also according to this docs, do I need to customise my current implementation( like add any other iam permissions or change from the application per se) steps in order to view container insights and custom namespace metrics?
Also according to this docs, do I need to customise my current implementation( like add any other iam permissions or change from the application per se) steps in order to view container insights and custom namespace metrics?
Oh, I was under the impression you followed our docs. What steps have you not done?
@mhausenblas
Like i mentioned in this comment https://github.com/aws-observability/aws-otel-collector/issues/2441#issue-1967965382
Precisely I have followed this guide
(Optional) Verify the metrics data is being sent to Amazon CloudWatch by opening the Amazon CloudWatch console and open the Metrics menu on the left. Select All metrics and click the AOCDockerDemo/AOCDockerDemoService box under custom namespaces. You can view any metrics data by selecting any grouping.
Oh, I was under the impression you followed our docs. What steps have you not done?
This page https://aws-otel.github.io/docs/getting-started/container-insights/eks-prometheus, i have referenced only to tell that I was expecting metrics( under custom namespace ) and container insights as shown in the pictures. I haven't followed this page https://aws-otel.github.io/docs/getting-started/container-insights/eks-prometheus
Could you provide some examples of the CW EMF logs you are seeing in /aws/containerinsights/${CLUSTER_NAME}/prometheus
?
Can you expand more on this
Used this yaml file for collector configuration. I have modified it according to my needs. I have added eks-custom-service-monitoring seperate scraping job for collector and modified the exporter configuration little.
Specifically have modified it according to my needs
. Container insights configs are very opinionated because the container insights service expects a specific set of metrics and dimensions to be available. Modifications to the configuration could break that experience. Have you tried using the config without making any modificaitons?
@bryan-aguilar
Could you provide some examples of the CW EMF logs you are seeing in /aws/containerinsights/${CLUSTER_NAME}/prometheus?
log of application pod running in cluster scrapped by custom job
{
"EKS_Cluster": "my-custom-eks-cluster",
"EKS_ContainerName": "robo",
"EKS_PodName": "robo-5c6df8c54d-wxbrc",
"OTelLib": "otelcol/prometheusreceiver",
"grpc_code": "Canceled",
"grpc_method": "Get",
"grpc_server_handled_total": 0,
"grpc_service": "task.Service",
"grpc_type": "unary",
"http.scheme": "http",
"k8s.container.name": "robo",
"k8s.namespace.name": "my-namespace",
"k8s.node.name": "fargate-ip-192-168-129-129.ap-northeast-1.compute.internal",
"k8s.pod.name": "robo-5c6df8c54d-wxbrc",
"k8s.pod.uid": "1428ba14-3a16-439b-9766-a78cfff30ff3",
"k8s.replicaset.name": "robo-5c6df8c54d",
"net.host.name": "192.168.129.129",
"net.host.port": "9090",
"service.instance.id": "192.168.129.129:9090",
"service.name": "eks-custom-service-monitoring"
}
Log from Otel receiver
{
"EKS_Cluster": "my-custom-eks-cluste",
"OTelLib": "otelcol/prometheusreceiver",
"beta_kubernetes_io_arch": "amd64",
"beta_kubernetes_io_os": "linux",
"container": "router",
"container_spec_cpu_period": 100000,
"container_spec_cpu_quota": 25000,
"container_spec_cpu_shares": 256,
"container_spec_memory_limit_bytes": 0,
"container_spec_memory_reservation_limit_bytes": 0,
"container_spec_memory_swap_limit_bytes": 0,
"container_start_time_seconds": 1698124489,
"eks_amazonaws_com_compute_type": "fargate",
"failure_domain_beta_kubernetes_io_region": "ap-northeast-1",
"failure_domain_beta_kubernetes_io_zone": "ap-northeast-1c",
"http.scheme": "https",
"id": "/kubepods/burstable/pod3xxxxxxxx/xxxxxxxxx",
"image": "xxxxxxxxxxx.xxxxxxx.ecr.us-east-1.amazonaws.com/xxxxxxxxxxxxx/route:xxxxxxxxxxxx",
"k8s.node.name": "fargate-ip-192-168-145-184.ap-northeast-1.compute.internal",
"kubernetes_io_arch": "amd64",
"kubernetes_io_hostname": "fargate-ip-192-168-145-184.ap-northeast-1.compute.internal",
"kubernetes_io_os": "linux",
"name": "xxxxxxxxxxxxxxx",
"namespace": "my-namespace",
"net.host.name": "fargate-ip-192-168-145-184.ap-northeast-1.compute.internal",
"net.host.port": "",
"pod": "route-5788d4489d-rgppl",
"service.instance.id": "fargate-ip-192-168-145-184.ap-northeast-1.compute.internal",
"service.name": "kubernetes-nodes-cadvisor",
"topology_kubernetes_io_region": "ap-northeast-1",
"topology_kubernetes_io_zone": "ap-northeast-1c"
}
log of application pod running in cluster scrapped by custom job
{
"EKS_Cluster": "my-custom-eks-cluster",
"EKS_ContainerName": "redis",
"EKS_PodName": "redis-56944bf684-qvs8c",
"OTelLib": "otelcol/prometheusreceiver",
"db": "db1",
"http.scheme": "http",
"k8s.container.name": "redis-exporter",
"k8s.namespace.name": "my-namespac",
"k8s.node.name": "fargate-ip-192-168-112-42.ap-northeast-1.compute.internal",
"k8s.pod.name": "redis-56944bf684-qvs8c",
"k8s.pod.uid": "df01127c-d870-4488-9bd0-0f8a7f4e021d",
"k8s.replicaset.name": "redis-56944bf684",
"net.host.name": "192.168.112.42",
"net.host.port": "9121",
"redis_db_keys": 0,
"redis_db_keys_expiring": 0,
"service.instance.id": "192.168.112.42:9121",
"service.name": "eks-custom-service-monitoring"
}
Are these example logs enough?
@bryan-aguilar
CC: @mhausenblas
Can you expand more on this
Used this yaml file for collector configuration. I have modified it according to my needs. I have added eks-custom-service-monitoring seperate scraping job for collector and modified the exporter configuration little.
Specifically have modified it according to my needs. Container insights configs are very opinionated because the container insights service expects a specific set of metrics and dimensions to be available. Modifications to the configuration could break that experience. Have you tried using the config without making any modificaitons?
I will divide my follow-up into 2 parts:-
When I have tried the default configuration. I was able to confirm the following:-
Cloudwatch logs of the collector ✅
CustomNameSpace and metrics inside it ✅
Container insights ❌
Default configuration example
# OpenTelemetry Collector configuration
# Metrics pipeline with Prometheus Receiver and Amazon CloudWatch EMF Exporter sending metrics to Amazon CloudWatch
#
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: my-collector-cloudwatch
spec:
mode: deployment
serviceAccount: adot-collector-sa
podAnnotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8888'
resources:
requests:
cpu: "1"
limits:
cpu: "1"
env:
- name: CLUSTER_NAME
value: my-eks-cluster
config: |
receivers:
#
# Scrape configuration for the Prometheus Receiver
# This is the same configuration used when Prometheus is installed using the community Helm chart
#
prometheus:
config:
global:
scrape_interval: 15s
scrape_timeout: 10s
scrape_configs:
- job_name: kubernetes-apiservers
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
- job_name: kubernetes-nodes
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- replacement: kubernetes.default.svc:443
target_label: __address__
- regex: (.+)
replacement: /api/v1/nodes/$$1/proxy/metrics
source_labels:
- __meta_kubernetes_node_name
target_label: __metrics_path__
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
- job_name: kubernetes-nodes-cadvisor
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- replacement: kubernetes.default.svc:443
target_label: __address__
- regex: (.+)
replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor
source_labels:
- __meta_kubernetes_node_name
target_label: __metrics_path__
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
- job_name: kubernetes-service-endpoints
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $$1:$$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
replacement: __param_$$1
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: kubernetes_node
- job_name: kubernetes-service-endpoints-slow
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $$1:$$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
replacement: __param_$$1
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: kubernetes_node
scrape_interval: 5m
scrape_timeout: 30s
- job_name: prometheus-pushgateway
kubernetes_sd_configs:
- role: service
relabel_configs:
- action: keep
regex: pushgateway
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_probe
- job_name: kubernetes-services
kubernetes_sd_configs:
- role: service
metrics_path: /probe
params:
module:
- http_2xx
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_probe
- source_labels:
- __address__
target_label: __param_target
- replacement: blackbox
target_label: __address__
- source_labels:
- __param_target
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $$1:$$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
replacement: __param_$$1
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
- action: drop
regex: Pending|Succeeded|Failed|Completed
source_labels:
- __meta_kubernetes_pod_phase
- job_name: kubernetes-pods-slow
scrape_interval: 5m
scrape_timeout: 30s
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $$1:$$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
replacement: __param_$1
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- action: drop
regex: Pending|Succeeded|Failed|Completed
source_labels:
- __meta_kubernetes_pod_phase
processors:
batch/metrics:
timeout: 60s
# send_batch_size: 50
#
# Processor to transform the names of existing labels and/or add new labels to the metrics identified
#
metricstransform/labelling:
transforms:
- include: .*
match_type: regexp
action: update
operations:
- action: add_label
new_label: EKS_Cluster
new_value: ${CLUSTER_NAME}
- action: update_label
label: kubernetes_pod_name
new_label: EKS_PodName
- action: update_label
label: kubernetes_namespace
new_label: EKS_Namespace
exporters:
#
# AWS EMF exporter that sends metrics data as performance log events to Amazon CloudWatch
# Only the metrics that were filtered out by the processors get to this stage of the pipeline
# Under the metric_declarations field, add one or more sets of Amazon CloudWatch dimensions
# Each dimension must alredy exist as a label on the Prometheus metric
# For each set of dimensions, add a list of metrics under the metric_name_selectors field
# Metrics names may be listed explicitly or using regular expressions
# A default list of metrics has been provided
# Data from performance log events will be aggregated by Amazon CloudWatch using these dimensions to create an Amazon CloudWatch custom metric
#
awsemf:
region: us-east-1
role_arn: arn:aws:iam::xxxxxxxxxxxxx:role/role-adot-prometheus-metric-write-cloudwatch-logs
namespace: ContainerInsights/Prometheus
log_group_name: '/aws/containerinsights/${CLUSTER_NAME}/prometheus'
resource_to_telemetry_conversion:
enabled: true
dimension_rollup_option: NoDimensionRollup
parse_json_encoded_attr_values: [Sources, kubernetes]
metric_declarations:
- dimensions: [[EKS_Cluster, EKS_Namespace, EKS_PodName]]
metric_name_selectors:
- apiserver_request_.*
- container_memory_.*
- container_threads
- otelcol_process_.*
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch/metrics,metricstransform/labelling]
exporters: [awsemf]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: otel-prometheus-role
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: otel-prometheus-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: otel-prometheus-role
subjects:
- kind: ServiceAccount
name: adot-collector-sa
namespace: default
Since I am able to see the collector pod metrics I was hoping to see container insights at least for the collector pod metrics, Is there any other settings which is needed to configure collector insights even in the case of default configuration?
My goal is to scrape metrics fro application pods running in my cluster in the my-custom-namespace
and pods with port 9090|9121
Also I needed to scrape metrics
I have removed lines 217-303 and added my custom job to scrape pod metrics
my custom job to scrape metrics
job_name: mu-custom-service-monitoring kubernetes_sd_configs:
Till this point it is the same configuration( removed 217-303 lines, added my custom scraping job) I have used ADOT collector with AMP + AMG and it worked I have verified that.
Note: application pods run in custom namespace and adot collector is running in default namespace which worked for me with ADOT + AMP + AMG
Container insights configs are very opinionated because the container insights service expects a specific set of metrics and dimensions to be available. Modifications to the configuration could break that experience.
AMP, AMG
v/s Cloudwatch
@mhausenblas
CC: @bryan-aguilar
Is there any update to this https://github.com/aws-observability/aws-otel-collector/issues/2441#issuecomment-1790480116
@jatinmehrotra I'm also getting same issue, any update or solution?
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
@AbhishPrasad I wasn't able to make this work, so its its still pending from my side too.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
This issue was closed because it has been marked as stale for 30 days with no activity.
Describe the question
I am trying to send Prometheus metrics scraped by the ADOT collector by Prometheus and send it to Cloudwatch instead of AMP.
Steps to reproduce if your question is related to an action
eks-custom-service-monitoring
seperate scraping job for collector and modified the exporter configuration little.adot-collector-sa
role ( in workload account ) which has permission to assume iam role (n monitoring account). I have added permissions below.What did you expect to see?
As per this docs I expected the following:
/aws/containerinsights/${CLUSTER_NAME}/prometheus
✅ . This also means that my custom scraping job is working absolutely fine.ContainerInsights/Prometheus
in cloud watch metric as custom namespace and metrics inside it as per my collector configuration❌Environment
As per docs I have added Managed AWS Policy of CloudWatchAgentServerPolicy.
Additional context
This configuration works for ADOT collector with AMP so my custom job for scraping and iam role permissions are not incorrect IMO.