GoogleCloudPlatform / ops-agent

Apache License 2.0
141 stars 68 forks source link

Some metrics cannot be excluded #403

Open mirkogrcic opened 2 years ago

mirkogrcic commented 2 years ago

Hello, we are using this config


logging:
  receivers:
    syslog:
      type: files
      include_paths:
      - /var/log/messages
      - /var/log/syslog
  service:
    pipelines:
      default_pipeline:
        receivers: [syslog]
metrics:
  receivers:
    hostmetrics:
      type: hostmetrics
      collection_interval: 60s
  processors:
    metrics_filter:
      type: exclude_metrics
      metrics_pattern:
      - agent.googleapis.com/cpu/*
      - agent.googleapis.com/interface/*
      - agent.googleapis.com/network/*
      - agent.googleapis.com/memory/bytes_used/*
      - agent.googleapis.com/swap/*
      - agent.googleapis.com/disk/io_time/*
      - agent.googleapis.com/disk/merged_operations/*
      - agent.googleapis.com/pagefile/*
      - agent.googleapis.com/disk/weighted_io_time/*
      - agent.googleapis.com/disk/write_bytes_count/*
      - agent.googleapis.com/disk/read_bytes_count/*
      - agent.googleapis.com/processes/*
  service:
    pipelines:
      default_pipeline:
        receivers: [hostmetrics]
        processors: [metrics_filter]

The agent.googleapis.com/processes/* exclusion and other ones with 2 path components (not including the /*) work but exclusions with 3 path components do not work (like agent.googleapis.com/memory/bytes_used/*)

Being excluded correctly:

Not being excluded:

I'm assuming the problem here is that metrics like agent.googleapis.com/disk/read_bytes_count/* do not branch out anymore so the ending /* is a problem because it expects it to branch out more, removing /* throws an error when starting the google-cloud-ops-agent as it's hardcoded to require it here Metrics like agent.googleapis.com/processes/* are getting excluded because they branch out into agent.googleapis.com/processes/count_by_state so the /* matches

vohtaski commented 2 years ago

Same problem here. Also would be great to exclude more options or even better to have an option of what to include.

In our scenarios, we don't care about loop* disks, but only about sda* disks.

In Legacy monitoring agent we configured collectd this way, however we don't find an option to do the same in the new Ops Agent.

LoadPlugin df
<Plugin "df">
  Device "/dev/sda1" # we are interested only in the main drive
  ReportByDevice true
  ValuesPercentage true
</Plugin>
quentinmit commented 2 years ago

Only being able to exclude a directory of metrics at a time is currently WAI.

vohtaski commented 2 years ago

Any plans to provide more granularity soon? Logging all this data from multiple machines signficantly increases monthly bill, while all these loop information is pretty useless for us and the only important thing is sda1

image
panthony commented 1 year ago

I had the same issues with /dev/loop* devices and because it gets really expensive I decided to fork the agent to exclude them by default.

https://github.com/cogniteev/ops-agent/commit/dd747cdd6c274d1985ed0c93246251f41032f16f

The ideal scenario would probably be to be able to override the default scrapers altogether from ops-agent configuration file.

Or even better, be capable of defining several time the hostmetrics receiver like it's possible to do with opentelemetry-collector where you can define different frequencies depending on the metrics:

https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver#different-frequencies