bryanklewis / prometheus-eventhubs-adapter

Use Azure Event Hubs as a remote storage for Prometheus
Apache License 2.0
10 stars 12 forks source link

Filter samples by name #18

Closed AlvinRamoutar closed 3 years ago

AlvinRamoutar commented 3 years ago

For clusters which write out a plethora of metrics, this will allow us to choose what we process and send to Event Hubs.

Goals are as such:

  1. Reduce processing effort on prometheus-eventhubs-adapter for metrics we don't care about
  2. Reduce cost of Event Hubs ingress for metrics we don't care about
  3. Work-around the limitation of Prometheus' remoteWrite which doesn't have this functionality
AlvinRamoutar commented 3 years ago

Wrote up a quick PR implementing this functionality, intended to be used as such:

filterType Choose what filter to enable. Default is none (0). Whitelist (1) will process samples whose metric name is provided in filterBy. Blacklist (2) will process samples whose metric name is not provided in filterBy.

filterBy Specify metric names (case-insensitive) to filter by. Comma-delimited.

bryanklewis commented 3 years ago

Hey @AlvinRamoutar, thank-you again for your input and contrib. Good-news, this functionality is actually built into Prometheus under "write_relabel_configs". I use this functionality myself in production. I relabel and drop metric on import for the ones i don't want to save in Prometheus. Then when remote write is called, you can run another round of relabel processing to decide what gets sent to event hub. Example below:

global:
  scrape_interval:     5m
  evaluation_interval: 10m
  scrape_timeout: 10s

remote_write:
  - url: 'http://localhost/write
    remote_timeout: 20s
    name: eventhub-1
    queue_config:
       capacity: 1000
       max_shards: 1000
       max_samples_per_send: 200
       batch_send_deadline: 1s
       min_backoff: 1m
       max_backoff: 24h
    write_relabel_configs: <-----------Here is what you want
       # This is processed after staged for saving in database,
       # but before sending to remote storage
       - action: keep #or drop
         source_labels: [__name__]
         regex: 'metric_something_.*'

  - job_name: 'linux'
    file_sd_configs:
    - files:
      - '/etc/prometheus/targets.json'
      refresh_interval: 5m

    relabel_configs:
    # Set instance label
    # This is processed before scrape
    - action: replace
      source_labels: [__address__]
      #remove port
      regex: '([^:]+):\d+'
      target_label: instance

    metric_relabel_configs:
    # Drop Golang stats
    # this is processed after scrape, before saving to the database
    - source_labels: [__name__]
      regex: '(go_gc_|go_info|process_virtual_|go_memstats_(buck|frees|gc|lookups|mallocs|mcache|mspan|next|other|stack|sys|heap)_[^o]).*'
      action: drop

I dont think there is a need to duplicate this in the adapter. If your addressing something else or im missing the point please correct me.

bryanklewis commented 3 years ago

closing, no response from submitter