aws / amazon-cloudwatch-agent

CloudWatch Agent enables you to collect and export host-level metrics and logs on instances running Linux or Windows server.
MIT License
445 stars 204 forks source link

Amazon Cloudwatch Agent Does Not Use Credential File for Inputs: Prometheus Scraper Configuration #741

Open colinbjohnson opened 1 year ago

colinbjohnson commented 1 year ago

Bug Report

The CloudWatch agent does not appear to utilize the file named /root/.aws/credentials as a credential source. The logs below show that the credential file is being used for "outputs" configuration but is not being used by some number of "inputs" configurations in particular [inputs.prometheus_scraper.ecs_service_discovery.service_name_list_for_tasks].

Evidence of credential file being used by "output" - from the logs you can see shared_credential_file is set to shared_credential_file = "/root/.aws/credentials".

[outputs]

  [[outputs.cloudwatchlogs]]
    force_flush_interval = "5s"
    log_stream_name = "7eccfaad73e7"
    profile = "AmazonCloudWatchAgent"
    region = "us-west-2"
    shared_credential_file = "/root/.aws/credentials"
    tagexclude = ["metricPath"]
    [outputs.cloudwatchlogs.tagpass]
      metricPath = ["logs"]

And evidence that the file is not being used by inputs despite being available:

From the logs you can see no evidence that shared_credential_file = "/root/.aws/credentials" is used.

[inputs]

  [[inputs.prometheus_scraper]]
    cluster_name = "test"
    prometheus_config_path = "/etc/cwagentprometheusconfig/prometheus.yaml"
    [inputs.prometheus_scraper.ecs_service_discovery]
      sd_cluster_region = "us-west-2"
      sd_frequency = "1m"
      sd_result_file = "/tmp/cwagent_ecs_auto_sd.yaml"
      sd_target_cluster = "ecs-prometheus"

      [[inputs.prometheus_scraper.ecs_service_discovery.service_name_list_for_tasks]]
        sd_job_name = "prometheus-exporter"
        sd_metrics_path = "/metrics"
        sd_metrics_ports = "5000"
        sd_service_name_pattern = "prometheus-exporter"
    [inputs.prometheus_scraper.tags]
      log_group_name = "prometheus"
      metricPath = "logs"

and further:

ts=2023-04-22T18:35:02.503Z caller=start.go:325 level=info msg="Add extra relabel_configs and metric_relabel_configs to save job, instance and __name__ before user relabel"
ts=2023-04-22T18:35:02.503Z caller=start.go:342 level=info msg="Completed loading of configuration file" filename=/etc/cwagentprometheusconfig/prometheus.yaml
ts=2023-04-22T18:35:02.503Z caller=start.go:246 level=info msg="finish handling config file"
ts=2023-04-22T18:35:02.503Z caller=start.go:183 level=info msg="start discovery"
2023-04-22T18:35:03Z E! Failed to get credential from session: NoCredentialProviders: no valid providers in chain
caused by: EnvAccessKeyNotFound: failed to find credentials in the environment.
SharedCredsLoad: failed to load profile, .
EC2RoleRequestError: no EC2 instance role found
caused by: RequestError: send request failed
caused by: Get "http://169.254.169.254/latest/meta-data/iam/security-credentials/": dial tcp 169.254.169.254:80: connect: connection refused
2023-04-22T18:35:03Z D! [logagent] open file count, 0

Reproduction

To reproduce the error use the files contained within the "Config" section below.

Expected Output

The credentials from the credentials file should be used.

Actual Output

I saw the following errors followed by an application exit:

ts=2023-04-22T18:35:02.503Z caller=start.go:325 level=info msg="Add extra relabel_configs and metric_relabel_configs to save job, instance and __name__ before user relabel"
ts=2023-04-22T18:35:02.503Z caller=start.go:342 level=info msg="Completed loading of configuration file" filename=/etc/cwagentprometheusconfig/prometheus.yaml
ts=2023-04-22T18:35:02.503Z caller=start.go:246 level=info msg="finish handling config file"
ts=2023-04-22T18:35:02.503Z caller=start.go:183 level=info msg="start discovery"
2023-04-22T18:35:03Z E! Failed to get credential from session: NoCredentialProviders: no valid providers in chain
caused by: EnvAccessKeyNotFound: failed to find credentials in the environment.
SharedCredsLoad: failed to load profile, .
EC2RoleRequestError: no EC2 instance role found
caused by: RequestError: send request failed
caused by: Get "http://169.254.169.254/latest/meta-data/iam/security-credentials/": dial tcp 169.254.169.254:80: connect: connection refused
2023-04-22T18:35:03Z D! [logagent] open file count, 0

Version

AmazonCloudWatchAgent 1.247358.0

Config

Configuration to reproduce this setup is below:

docker-compose.yml

services:
  cloudwatch-agent:
    image: public.ecr.aws/cloudwatch-agent/cloudwatch-agent:latest
    volumes:
      - $HOME/.aws/credentials:/root/.aws/credentials:ro
      - ./cloudwatch-agent/amazon-cloudwatch-agent-prometheus.json:/etc/cwagentconfig/config.json
      - ./cloudwatch-agent/prometheus.yaml:/etc/cwagentprometheusconfig/prometheus.yaml

amazon-cloudwatch-agent-prometheus.json

{
  "agent":{
    "debug": true,
    "logfile": "",
    "metrics_collection_interval":10
  },
  "logs":{
    "metrics_collected":{
      "prometheus":{
        "ecs_service_discovery": {
            "sd_cluster_region": "us-west-2",
            "sd_frequency": "1m",
            "sd_result_file": "/tmp/cwagent_ecs_auto_sd.yaml",
            "sd_target_cluster": "ecs-prometheus",
            "service_name_list_for_tasks": [
              {
                "sd_job_name": "prometheus-exporter",
                "sd_metrics_path": "/metrics",
                "sd_metrics_ports": "5000",
                "sd_service_name_pattern": "prometheus-exporter"
              }
            ]
        },
        "cluster_name": "test",
        "log_group_name":"prometheus",
        "prometheus_config_path":"/etc/cwagentprometheusconfig/prometheus.yaml"
      }
    }
  }
}

prometheus.yaml

global:
  scrape_interval: 1m
  scrape_timeout: 10s
scrape_configs:
  - job_name: prometheus-exporter
    metrics_path: /metrics
    scheme: http
    file_sd_configs:
      - files:
        - /tmp/cwagent_ecs_auto_sd.yaml

Environment

I am using the Docker Image public.ecr.aws/cloudwatch-agent/cloudwatch-agent:latest - which, at this moment, contains CloudWatch agent 1.247358.0.

Additional Context

None.

SaxyPandaBear commented 1 year ago

That's interesting. We'll have to dive a little deeper into it, but if I understand correctly, you are using a static file credential in your CloudWatch agent container, running on ECS? I don't think I've come across someone doing that yet, though I could see why someone would want to.

colinbjohnson commented 1 year ago

We actually mount the credentials file into the Docker container when running locally - we use this configuration for testing.

github-actions[bot] commented 1 year ago

This issue was marked stale due to lack of activity.

github-actions[bot] commented 3 months ago

This issue was marked stale due to lack of activity.

colinbjohnson commented 3 months ago

I wouldn't consider this issue stale - I believe it is still an issue.