aws / amazon-cloudwatch-agent

CloudWatch Agent enables you to collect and export host-level metrics and logs on instances running Linux or Windows server.
MIT License
433 stars 195 forks source link

[ECS Fargate] cannot use append_dimension option #864

Closed HieronyM closed 3 weeks ago

HieronyM commented 11 months ago

I tried to deploy Cloudwatch-agent on AWS ECS Fargate. But when I want to append the dimension, there's this error

Error: cannot start pipelines: EC2MetadataRequestError: failed to get EC2 instance identity document
caused by: RequestError: send request failed
caused by: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": dial tcp 169.254.169.254:80: connect: invalid argument
2023-09-25T06:01:07Z E! [telegraf] Error running agent: cannot start pipelines: EC2MetadataRequestError: failed to get EC2 instance identity document
caused by: RequestError: send request failed

I think the cloudwatch agent hit EC2 metadata endpoint (while my ECS using Fargate instead of EC2).

Additional information

Is there any example of Cloudwatch-agent config for ECS Fargate? I looked for documentation in Doc1 and Doc2, but still hit the wall.

Thank you

sethAmazon commented 11 months ago

Thank you for bringing this issue to our attention.

Can you please turn on debug logs and post the full agent log.

{
    "agent": {
        "debug": true
    },
    "metrics": {
        "append_dimensions": {
            "ServiceName": "${aws:ContainerName}"
        },
        "metrics_collected": {
            "mem": {
                "measurement": [
                    "mem_used_percent"
                ]
            }
        }
    }
}

Please also trying with tags public.ecr.aws/cloudwatch-agent/cloudwatch-agent:1.300026.3b189 and public.ecr.aws/cloudwatch-agent/cloudwatch-agent:1.247360.0b252689

HieronyM commented 11 months ago

@sethAmazon , Sorry for late response, just tried this tag public.ecr.aws/cloudwatch-agent/cloudwatch-agent:1.300026.3b189

Here's the complete log:

2023/10/09 08:07:11 I! D! [EC2] Found active network interface
E! [EC2] Cannot get EC2 Metadata from IMDS: EC2 metadata is not available.
2023/10/09 08:07:11 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
I! Detected the instance is ECS
2023/10/09 08:07:11 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
Cannot access /etc/cwagentconfig: lstat /etc/cwagentconfig: no such file or directory 
2023/10/09 08:07:11 unable to scan config dir /etc/cwagentconfig with error: lstat /etc/cwagentconfig: no such file or directory
2023/10/09 08:07:11 Reading json config from from environment variable CW_CONFIG_CONTENT.
2023/10/09 08:07:11 I! Valid Json input schema.
I! Trying to detect region from ec2
I! Trying to detect region from ecs
2023/10/09 08:07:11 D! ec2tagger processor required because append_dimensions is set
2023/10/09 08:07:11 D! pipeline hostDeltaMetrics has no receivers
2023/10/09 08:07:11 Configuration validation first phase succeeded

2023/10/09 08:07:11 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml 
2023/10/09 08:07:11 D! config [agent]
  collection_jitter = "0s"
  debug = true
  flush_interval = "1s"
  flush_jitter = "0s"
  hostname = ""
  interval = "60s"
  logfile = ""
  logtarget = "lumberjack"
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  omit_hostname = true
  precision = ""
  quiet = false
  round_interval = false

[inputs]

  [[inputs.mem]]
    fieldpass = ["used_percent"]

[outputs]

  [[outputs.cloudwatch]]
2023/10/09 08:07:11 I! Config has been translated into YAML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.yaml 
2023/10/09 08:07:11 D! config connectors: {}
exporters:
    awscloudwatch:
        force_flush_interval: 1m0s
        max_datums_per_call: 1000
        max_values_per_datum: 150
        namespace: CWAgent
        region: ap-southeast-1
        resource_to_telemetry_conversion:
            enabled: true
extensions: {}
processors:
    ec2tagger:
        ec2_instance_tag_keys: []
        ec2_metadata_tags: []
        refresh_interval_seconds: 0s
receivers:
    telegraf_mem:
        collection_interval: 1m0s
        initial_delay: 1s
service:
    extensions: []
    pipelines:
        metrics/host:
            exporters:
                - awscloudwatch
            processors:
                - ec2tagger
            receivers:
                - telegraf_mem
    telemetry:
        logs:
            development: false
            disable_caller: false
            disable_stacktrace: false
            encoding: console
            error_output_paths: []
            initial_fields: {}
            level: debug
            output_paths: []
            sampling:
                initial: 2
                thereafter: 500
        metrics:
            address: ""
            level: None
            metric_readers: []
        resource: {}
        traces:
            propagators: []
2023-10-09T08:07:11Z I! CWAGENT_LOG_LEVEL is set to "DEBUG"
2023-10-09T08:07:11Z I! Starting AmazonCloudWatchAgent CWAgent/1.300026.3b189 (go1.20.7; linux; amd64)
2023-10-09T08:07:11Z I! AWS SDK log level not set
2023-10-09T08:07:11Z I! Creating new logs agent
2023-10-09T08:07:11Z I! [logagent] starting
2023-10-09T08:07:11.653Z    info    service/telemetry.go:96 Skipping telemetry setup.   {
    "address": "",
    "level": "None"
}
2023-10-09T08:07:11.653Z    debug   exporter@v0.79.0/exporter.go:273    Alpha component. May change in the future.  {
    "kind": "exporter",
    "data_type": "metrics",
    "name": "awscloudwatch"
}
2023-10-09T08:07:11.653Z    debug   processor/processor.go:287  Stable component.   {
    "kind": "processor",
    "name": "ec2tagger",
    "pipeline": "metrics/host"
}
2023-10-09T08:07:11Z D! Successfully created credential sessions
2023-10-09T08:07:11Z D! Using credential ASIA5KRXOV7TS7MVR6ET from CredentialsEndpointProvider
2023-10-09T08:07:11.657Z    debug   receiver@v0.79.0/receiver.go:294    Stable component.   {
    "kind": "receiver",
    "name": "telegraf_mem",
    "data_type": "metrics"
}
2023-10-09T08:07:11.658Z    info    service/service.go:131  Starting ...    {
    "Version": "",
    "NumCPU": 4
}
2023-10-09T08:07:11.658Z    info    extensions/extensions.go:30 Starting extensions...
2023-10-09T08:07:11Z D! Successfully created credential sessions
2023-10-09T08:07:11Z D! Using credential ASIA5KRXOV7TS7MVR6ET from CredentialsEndpointProvider
2023-10-09T08:07:11Z I! cloudwatch: get unique roll up list []
2023-10-09T08:07:11.660Z    info    ec2tagger/ec2tagger.go:444  ec2tagger: Check EC2 Metadata.  {
    "kind": "processor",
    "name": "ec2tagger",
    "pipeline": "metrics/host"
}
2023-10-09T08:07:11Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 32.409766497s
2023-10-09T08:07:11.922Z    error   ec2tagger/ec2tagger.go:447  ec2tagger: Unable to retrieve EC2 Metadata. This plugin must only be used on an EC2 instance.   {
    "kind": "processor",
    "name": "ec2tagger",
    "pipeline": "metrics/host"
}
github.com/aws/amazon-cloudwatch-agent/plugins/processors/ec2tagger.(*Tagger).deriveEC2MetadataFromIMDS
    github.com/aws/amazon-cloudwatch-agent/plugins/processors/ec2tagger/ec2tagger.go:447
github.com/aws/amazon-cloudwatch-agent/plugins/processors/ec2tagger.(*Tagger).Start
    github.com/aws/amazon-cloudwatch-agent/plugins/processors/ec2tagger/ec2tagger.go:296
go.opentelemetry.io/collector/component.StartFunc.Start
    go.opentelemetry.io/collector/component@v0.79.0/component.go:73
go.opentelemetry.io/collector/service/internal/graph.(*Graph).StartAll
    go.opentelemetry.io/collector@v0.79.0/service/internal/graph/graph.go:284
go.opentelemetry.io/collector/service.(*Service).Start
    go.opentelemetry.io/collector@v0.79.0/service/service.go:140
go.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents
    go.opentelemetry.io/collector@v0.79.0/otelcol/collector.go:173
go.opentelemetry.io/collector/otelcol.(*Collector).Run
    go.opentelemetry.io/collector@v0.79.0/otelcol/collector.go:198
go.opentelemetry.io/collector/otelcol.NewCommand.func1
    go.opentelemetry.io/collector@v0.79.0/otelcol/command.go:27
github.com/spf13/cobra.(*Command).execute
    github.com/spf13/cobra@v1.7.0/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
    github.com/spf13/cobra@v1.7.0/command.go:1068
github.com/spf13/cobra.(*Command).Execute
    github.com/spf13/cobra@v1.7.0/command.go:992
main.runAgent
    github.com/aws/amazon-cloudwatch-agent/cmd/amazon-cloudwatch-agent/amazon-cloudwatch-agent.go:357
main.reloadLoop
    github.com/aws/amazon-cloudwatch-agent/cmd/amazon-cloudwatch-agent/amazon-cloudwatch-agent.go:172
main.main
    github.com/aws/amazon-cloudwatch-agent/cmd/amazon-cloudwatch-agent/amazon-cloudwatch-agent.go:580
runtime.main
    runtime/proc.go:250
2023-10-09T08:07:11.922Z    warn    ec2tagger/ec2tagger.go:449  ec2tagger: Timeout may have occurred because hop limit is too small. Please increase hop limit to 2 by following this document https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-options.html#configuring-IMDS-existing-instances.  {
    "kind": "processor",
    "name": "ec2tagger",
    "pipeline": "metrics/host"
}
2023-10-09T08:07:11.922Z    info    service/service.go:157  Starting shutdown...
2023-10-09T08:07:11.922Z    debug   adapter/receiver.go:76  Shutdown adapter    {
    "kind": "receiver",
    "name": "telegraf_mem",
    "data_type": "metrics",
    "receiver": "mem"
}
2023-10-09T08:07:11Z D! Stopping the CloudWatch output plugin
2023-10-09T08:07:11Z D! Stopped the CloudWatch output plugin
2023-10-09T08:07:11.965Z    info    extensions/extensions.go:44 Stopping extensions...
2023-10-09T08:07:11.965Z    info    service/service.go:171  Shutdown complete.
Error: cannot start pipelines: EC2MetadataRequestError: failed to get EC2 instance identity document
caused by: RequestError: send request failed
caused by: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": dial tcp 169.254.169.254:80: connect: invalid argument
2023-10-09T08:07:11Z D! [outputs.cloudwatch] LogThrottleRetryer watch throttle events goroutine exiting
2023-10-09T08:07:11Z E! [telegraf] Error running agent: cannot start pipelines: EC2MetadataRequestError: failed to get EC2 instance identity document
caused by: RequestError: send request failed
caused by: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": dial tcp 169.254.169.254:80: connect: invalid argument
HieronyM commented 11 months ago

@sethAmazon, Here the error log for tag:public.ecr.aws/cloudwatch-agent/cloudwatch-agent:1.247360.0b252689

2023/10/09 08:17:23 I! D! [EC2] Found active network interface
E! [EC2] Cannot get EC2 Metadata from IMDS: EC2 metadata is not available.
2023/10/09 08:17:23 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
I! Detected the instance is ECS
2023/10/09 08:17:23 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
Cannot access /etc/cwagentconfig: lstat /etc/cwagentconfig: no such file or directory 
2023/10/09 08:17:23 unable to scan config dir /etc/cwagentconfig with error: lstat /etc/cwagentconfig: no such file or directory
2023/10/09 08:17:23 Reading json config from from environment variable CW_CONFIG_CONTENT.
2023/10/09 08:17:23 I! Valid Json input schema.
I! Trying to detect region from ec2
I! Trying to detect region from ecs
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded

2023/10/09 08:17:23 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml 
2023/10/09 08:17:23 D! toml config [agent]
  collection_jitter = "0s"
  debug = true
  flush_interval = "1s"
  flush_jitter = "0s"
  hostname = ""
  interval = "60s"
  logfile = ""
  logtarget = "lumberjack"
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  omit_hostname = true
  precision = ""
  quiet = false
  round_interval = false

[inputs]

  [[inputs.mem]]
    fieldpass = ["used_percent"]
    [inputs.mem.tags]
      metricPath = "metrics"

[outputs]

  [[outputs.cloudwatch]]
    force_flush_interval = "60s"
    namespace = "CWAgent"
    region = "ap-southeast-1"
    tagexclude = ["host", "metricPath"]
    [outputs.cloudwatch.tagpass]
      metricPath = ["metrics"]

[processors]

  [[processors.ec2tagger]]
    refresh_interval_seconds = "0s"
    [processors.ec2tagger.tagpass]
      metricPath = ["metrics"]
2023-10-09T08:17:23Z I! CWAGENT_LOG_LEVEL is set to "DEBUG"
2023-10-09T08:17:23Z I! Starting AmazonCloudWatchAgent CWAgent/1.247360.0b252689 (go1.20.5; linux; amd64)
2023-10-09T08:17:23Z I! AWS SDK log level not set
2023-10-09T08:17:23Z I! Loaded inputs: mem
2023-10-09T08:17:23Z I! Loaded aggregators: 
2023-10-09T08:17:23Z I! Loaded processors: ec2tagger
2023-10-09T08:17:23Z I! Loaded outputs: cloudwatch
2023-10-09T08:17:23Z I! Tags enabled: 
2023-10-09T08:17:23Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"", Flush Interval:1s
2023-10-09T08:17:23Z D! [agent] Initializing plugins
2023-10-09T08:17:23Z I! [processors.ec2tagger] ec2tagger: Check EC2 Metadata.
2023-10-09T08:17:23Z D! Successfully created credential sessions
2023-10-09T08:17:23Z I! [logagent] starting
2023-10-09T08:17:23Z D! Using credential ASIA5KRXOV7T72CPHVER from CredentialsEndpointProvider
2023-10-09T08:17:23Z E! [processors.ec2tagger] ec2tagger: Unable to retrieve EC2 Metadata. This plugin must only be used on an EC2 instance.
2023-10-09T08:17:23Z W! [processors.ec2tagger] ec2tagger: Timeout may have occurred because hop limit is too small. Please increase hop limit to 2 by following this document https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-options.html#configuring-IMDS-existing-instances.
2023-10-09T08:17:23Z E! [telegraf] Error running agent: could not initialize processor processors.ec2tagger: EC2MetadataRequestError: failed to get EC2 instance identity document
caused by: RequestError: send request failed
caused by: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": dial tcp 169.254.169.254:80: connect: invalid argument
HieronyM commented 11 months ago

Both version still looking the metadata from http://169.254.169.254/latest/dynamic/instance-identity/document

etiennechabert commented 11 months ago

This is a valid blocker that should be addressed 🙏

As an alternative, you could also allow your users to decide what global_dimensions make sense and wish to use: https://github.com/aws/amazon-cloudwatch-agent/pull/673

At the moment you have lock down your interface regarding append_dimensions to very specific use case that look quite arbitrary, since at the moment:

github-actions[bot] commented 5 months ago

This issue was marked stale due to lack of activity.

github-actions[bot] commented 3 weeks ago

Closing this because it has stalled. Feel free to reopen if this issue is still relevant, or to ping the collaborator who labeled it stalled if you have any questions.