aws / amazon-cloudwatch-agent

CloudWatch Agent enables you to collect and export host-level metrics and logs on instances running Linux or Windows server.
MIT License
444 stars 202 forks source link

Cloudwatch Agent metrics endpoint unavailable #1310

Open krisko opened 2 months ago

krisko commented 2 months ago

Describe the bug When running amazon-cloudwatch-observability agent in EKS (installed as AWS addon) the agent has prometheus scrape annotations and creates cloudwatch-agent-monitoring service. This service should return prometheus metrics exposed by the agent but instead the call just fails to connect

$ curl cloudwatch-agent-monitoring.amazon-cloudwatch.svc.cluster.local:8888/metrics

curl: (7) Failed to connect to cloudwatch-agent-monitoring.amazon-cloudwatch.svc.cluster.local port 8888 after 4 ms: Could not connect to server

Annotations on the agent pods:

│                   prometheus.io/path: /metrics
│                   prometheus.io/port: 8888
│                   prometheus.io/scrape: true

Steps to reproduce Install amazon-cloudwatch-observability addon (addon_version that has been tested "v1.7.0-eksbuild.1" and "v1.10.0-eksbuild.2")

What did you expect to see? After querying the /metrics endpoint agent should return prometheus compatible metrics. Example below shows sample output from different metrics endpoint:

$ curl opentelemetry-operator.opentelemetry-operator-system.svc.cluster.local:8080/metrics
# HELP certwatcher_read_certificate_errors_total Total number of certificate read errors
# TYPE certwatcher_read_certificate_errors_total counter
certwatcher_read_certificate_errors_total 0
# HELP certwatcher_read_certificate_total Total number of certificate reads
# TYPE certwatcher_read_certificate_total counter
certwatcher_read_certificate_total 1

What did you see instead? Error message [Could not connect to server](curl: (7) Failed to connect to cloudwatch-agent-monitoring.amazon-cloudwatch.svc.cluster.local port 8888 after 4 ms: Could not connect to server)

What version did you use? Version: "v1.7.0-eksbuild.1" and "v1.10.0-eksbuild.2"

What config did you use? Config: default configuration, without any additional values

Environment OS: EKS cluster v1.29

jefchien commented 2 months ago

Related to https://github.com/aws/amazon-cloudwatch-agent-operator/issues/190