aws / amazon-cloudwatch-agent

CloudWatch Agent enables you to collect and export host-level metrics and logs on instances running Linux or Windows server.
MIT License
448 stars 205 forks source link

I wish fetch-config would not delete the .json config file #1218

Open jedwards1211 opened 5 months ago

jedwards1211 commented 5 months ago

It's counterproductive that running fetch-config deletes the input .json config file.
When I'm debugging issues I want to just edit the file and rerun the fetch-config command. The fact that fetch-config deletes the file makes this more of a hassle.

And to me it seems like the configuration is overcomplicated (converting to a different .toml format, there's also a .yaml file there for some reason). It would be way more straightforward if we just specify .json file or SSM parameter or whatever as the configuration source, and the CloudWatch agent just leaves that as the source of truth, i.e. always reads from that file or SSM parameter on startup instead of fetching it and storing it in some other format.

okankoAMZ commented 4 months ago

Hi

Using the fetch-config command should not deleting the config file. Could you provide some logs and outputs demonstrating this issue?

Thank you!

jedwards1211 commented 4 months ago

In the /opt/aws/amazon-cloudwatch-agent/etc directory:

[ec2-user@ip-172-31-44-197 etc]$ sudo cp amazon-cloudwatch-agent.json.bak amazon-cloudwatch-agent.json
[ec2-user@ip-172-31-44-197 etc]$ ls
amazon-cloudwatch-agent.d     amazon-cloudwatch-agent.json.bak  amazon-cloudwatch-agent.yaml  env-config.json
amazon-cloudwatch-agent.json  amazon-cloudwatch-agent.toml      common-config.toml            log-config.json
[ec2-user@ip-172-31-44-197 etc]$ sudo amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -s
****** processing amazon-cloudwatch-agent ******
2024/07/18 18:31:17 I! imds retry client will retry 1 times
I! Trying to detect region from ec2 D! [EC2] Found active network interface Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json.tmp
Start configuration validation...
2024/07/18 18:31:17 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json.tmp ...
2024/07/18 18:31:17 I! Valid Json input schema.
2024/07/18 18:31:17 I! imds retry client will retry 1 times
2024/07/18 18:31:17 D! ec2tagger processor required because append_dimensions is set
2024/07/18 18:31:17 D! pipeline hostDeltaMetrics has no receivers
2024/07/18 18:31:17 Configuration validation first phase succeeded
I! Detecting run_as_user...
I! Trying to detect region from ec2
D! [EC2] Found active network interface
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
[ec2-user@ip-172-31-44-197 etc]$ ls
amazon-cloudwatch-agent.d         amazon-cloudwatch-agent.toml  common-config.toml  log-config.json
amazon-cloudwatch-agent.json.bak  amazon-cloudwatch-agent.yaml  env-config.json

You can see that amazon-cloudwatch-agent.json is gone in output of the final ls.

amazon-cloudwatch-agent.log:

2024-07-18T18:31:17Z I! Profiler is stopped during shutdown
2024-07-18T18:31:17.681Z        info    otelcol/collector.go:227        Received signal from OS {"signal": "terminated"}
2024-07-18T18:31:17.682Z        info    service/service.go:157  Starting shutdown...
2024-07-18T18:31:17.692Z        info    extensions/extensions.go:44     Stopping extensions...
2024-07-18T18:31:17.693Z        info    service/service.go:171  Shutdown complete.
2024/07/18 18:31:19 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml 
2024/07/18 18:31:19 D! config [agent]
  collection_jitter = "0s"
  debug = false
  flush_interval = "1s"
  flush_jitter = "0s"
  hostname = ""
  interval = "300s"
  logfile = "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log"
  logtarget = "lumberjack"
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  omit_hostname = false
  precision = ""
  quiet = false
  round_interval = false

[inputs]

  [[inputs.disk]]
    fieldpass = ["used_percent"]
    mount_points = ["/mnt/data-01"]
    tagexclude = ["mode"]

  [[inputs.logfile]]
    destination = "cloudwatchlogs"
    file_state_folder = "/opt/aws/amazon-cloudwatch-agent/logs/state"

    [[inputs.logfile.file_config]]
      file_path = "/var/log/cloud-init-output.log"
      from_beginning = true
      log_group_name = "clarity-2-db-syslog-r02"
      log_stream_name = "/var/log/cloud-init-output.log"
      pipe = false
      retention_in_days = 7

    [[inputs.logfile.file_config]]
      file_path = "/var/log/cfn-init.log"
      from_beginning = true
      log_group_name = "clarity-2-db-syslog-r02"
      log_stream_name = "/var/log/cfn-init.log"
      pipe = false
      retention_in_days = 7

    [[inputs.logfile.file_config]]
      file_path = "/var/log/cfn-init-cmd.log"
      from_beginning = true
      log_group_name = "clarity-2-db-syslog-r02"
      log_stream_name = "/var/log/cfn-init-cmd.log"
      pipe = false
      retention_in_days = 7

    [[inputs.logfile.file_config]]
      file_path = "/var/log/cfn-hup.log"
      from_beginning = true
      log_group_name = "clarity-2-db-syslog-r02"
      log_stream_name = "/var/log/cfn-hup.log"
      pipe = false
      retention_in_days = 7

    [[inputs.logfile.file_config]]
      file_path = "/var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log"
      from_beginning = true
      log_group_name = "clarity-2-db-syslog-r02"
      log_stream_name = "/var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log"
      pipe = false
      retention_in_days = 7

    [[inputs.logfile.file_config]]
      file_path = "/var/log/manage-db-reconf.log"
      from_beginning = true
      log_group_name = "clarity-2-db-syslog-r02"
      log_stream_name = "/var/log/manage-db-reconf.log"
      pipe = false
      retention_in_days = 7

  [[inputs.mem]]
    fieldpass = ["used_percent"]

[outputs]

  [[outputs.cloudwatch]]

  [[outputs.cloudwatchlogs]]
    force_flush_interval = "5s"
    log_stream_name = "i-07597f6c4d5733042"
    region = "us-west-2"
2024/07/18 18:31:19 I! Config has been translated into YAML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.yaml 
2024/07/18 18:31:19 D! config connectors: {}
exporters:
    awscloudwatch:
        force_flush_interval: 1s
        max_datums_per_call: 1000
        max_values_per_datum: 150
        namespace: CWAgent
        region: us-west-2
        resource_to_telemetry_conversion:
            enabled: true
        rollup_dimensions:
            - - InstanceId
              - path
extensions: {}
processors:
    ec2tagger:
        ec2_instance_tag_keys: []
        ec2_metadata_tags:
            - InstanceId
        imds_retries: 1
        refresh_interval_seconds: 0s
receivers:
    telegraf_disk:
        collection_interval: 5m0s
        initial_delay: 1s
    telegraf_mem:
        collection_interval: 5m0s
        initial_delay: 1s
service:
    extensions: []
    pipelines:
        metrics/host:
            exporters:
                - awscloudwatch
            processors:
                - ec2tagger
            receivers:
                - telegraf_disk
                - telegraf_mem
    telemetry:
        logs:
            development: false
            disable_caller: false
            disable_stacktrace: false
            encoding: console
            error_output_paths: []
            initial_fields: {}
            level: info
            output_paths:
                - /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log
            sampling:
                initial: 2
                thereafter: 500
        metrics:
            address: ""
            level: None
            metric_readers: []
        resource: {}
        traces:
            propagators: []
2024/07/18 18:31:19 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2024/07/18 18:31:19 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2024/07/18 18:31:19 I! Valid Json input schema.
2024/07/18 18:31:19 I! Detected runAsUser: root
2024/07/18 18:31:19 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 0:0
2024-07-18T18:31:19Z I! Starting AmazonCloudWatchAgent CWAgent/1.300028.1 (go1.20.8; linux; amd64)
2024-07-18T18:31:19Z I! AWS SDK log level not set
2024-07-18T18:31:19Z I! creating new logs agent
2024-07-18T18:31:19Z I! [logagent] starting
2024-07-18T18:31:19Z I! [logagent] found plugin cloudwatchlogs is a log backend
2024-07-18T18:31:19Z I! [logagent] found plugin logfile is a log collection
2024-07-18T18:31:19Z I! [logagent] start logs plugin file paths [/var/log/cloud-init-output.log /var/log/cfn-init.log /var/log/cfn-init-cmd.log /var/log/cfn-hup.log /var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log /var/log/manage-db-reconf.log]
2024-07-18T18:31:19Z I! [inputs.logfile] turned on logs plugin
2024-07-18T18:31:19.552Z        info    service/telemetry.go:96 Skipping telemetry setup.       {"address": "", "level": "None"}
2024-07-18T18:31:19Z I! imds retry client will retry 1 times
2024-07-18T18:31:19.559Z        info    service/service.go:131  Starting CWAgent...     {"Version": "1.300028.1", "NumCPU": 2}
2024-07-18T18:31:19.559Z        info    extensions/extensions.go:30     Starting extensions...
2024-07-18T18:31:19Z I! cloudwatch: get unique roll up list [[InstanceId path]]
2024-07-18T18:31:19.572Z        info    ec2tagger/ec2tagger.go:435      ec2tagger: Check EC2 Metadata.  {"kind": "processor", "name": "ec2tagger", "pipeline": "metrics/host"}
2024-07-18T18:31:19Z I! cloudwatch: publish with ForceFlushInterval: 1s, Publish Jitter: 35.296087ms
2024-07-18T18:31:19.575Z        info    ec2tagger/ec2tagger.go:411      ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes      {"kind": "processor", "name": "ec2tagger", "pipeline": "metrics/host"}
2024-07-18T18:31:19.575Z        info    service/service.go:148  Everything is ready. Begin running and processing data.
2024-07-18T18:31:20Z I! [inputs.logfile] Reading from offset 51573 in /var/log/cloud-init-output.log
2024-07-18T18:31:20Z I! [inputs.logfile] Reading from offset 365 in /var/log/cfn-init.log
2024-07-18T18:31:20Z I! [inputs.logfile] Reading from offset 30110 in /var/log/cfn-hup.log
2024-07-18T18:31:20Z I! [inputs.logfile] Reading from offset 25243 in /var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log
2024-07-18T18:31:20Z I! First time setting retention for log group clarity-2-db-syslog-r02, update map to avoid setting twice
2024-07-18T18:31:20Z I! [logagent] piping log from clarity-2-db-syslog-r02//var/log/cloud-init-output.log(/var/log/cloud-init-output.log) to cloudwatchlogs with retention 7
2024-07-18T18:31:20Z I! [logagent] piping log from clarity-2-db-syslog-r02//var/log/cfn-init.log(/var/log/cfn-init.log) to cloudwatchlogs with retention -1
2024-07-18T18:31:20Z I! [logagent] piping log from clarity-2-db-syslog-r02//var/log/cfn-init-cmd.log(/var/log/cfn-init-cmd.log) to cloudwatchlogs with retention -1
2024-07-18T18:31:20Z I! [logagent] piping log from clarity-2-db-syslog-r02//var/log/cfn-hup.log(/var/log/cfn-hup.log) to cloudwatchlogs with retention -1
2024-07-18T18:31:20Z I! [logagent] piping log from clarity-2-db-syslog-r02//var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log(/var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log) to cloudwatchlogs with retention -1
2024-07-18T18:31:20Z I! [logagent] piping log from clarity-2-db-syslog-r02//var/log/manage-db-reconf.log(/var/log/manage-db-reconf.log) to cloudwatchlogs with retention -1
okankoAMZ commented 4 months ago

Hi! Thank you for providing the logs. The fetch-config shouldn't delete the json file by design. I will try to re-create this issue and get back to you as soon as possible.

platymatt commented 3 months ago

Any updates to this issue? We are experiencing the same thing. Is it expected instead that the config.json file gets transposed into the .toml file and the .json file is removed as it is no longer needed?

jedwards1211 commented 2 months ago

@okankoAMZ I just noticed this in the journal after re-fetching the config... the main PID is logging:

/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.

I want to emphasize again, the number of different files and formats CWAgent seems to shuffle the config through doesn't inspire confidence. It seems like asking for bugs.

● amazon-cloudwatch-agent.service - Amazon CloudWatch Agent
     Loaded: loaded (/etc/systemd/system/amazon-cloudwatch-agent.service; enabled; preset: disabled)
     Active: active (running) since Tue 2024-09-17 00:48:53 UTC; 5s ago
   Main PID: 435744 (amazon-cloudwat)
      Tasks: 8 (limit: 2257)
     Memory: 105.1M
        CPU: 888ms
     CGroup: /system.slice/amazon-cloudwatch-agent.service
             └─435744 /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml -envconfig /opt/aws/amazon-cloudwatch-agent/etc/env-config.json -otelconfig /opt/aws/amazon-cloud>

Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: 2024/09/17 00:48:54 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: 2024/09/17 00:48:54 I! Valid Json input schema.
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: I! Detecting run_as_user...
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: I! Trying to detect region from ec2
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: 2024/09/17 00:48:54 D! ec2tagger processor required because append_dimensions is set
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: 2024/09/17 00:48:54 D! pipeline hostDeltaMetrics has no receivers
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435749]: 2024/09/17 00:48:54 Configuration validation first phase succeeded
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435744]: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
Sep 17 00:48:54 ip-172-31-32-255.us-west-2.compute.internal start-amazon-cloudwatch-agent[435744]: I! Detecting run_as_user...
solomongit3 commented 2 months ago

Hi any update on the file getting deleted

Riskcomplexx commented 1 month ago

This occurs on EC2 (Linux 2023) as well by default.