canonical / grafana-agent-k8s-operator

https://charmhub.io/grafana-agent-k8s
Apache License 2.0
8 stars 18 forks source link

Add alert rule for arp cache #193

Closed rgildein closed 1 year ago

rgildein commented 1 year ago

Add alert for arp cache reaching 80& threshold. This alert rule needs grafana-agent version 0.33.1 and higher, at the same time sysctl collector must be enabled with the following configuration:

    enable_collectors:
      - arp
      - sysctl
    sysctl_include:
      - net.ipv4.neigh.default.gc_thresh3

This changes are blocked by PR.

Context

Moving memory NRPE checks from charm-nrpe.

Testing Instructions

Tested with

rule_files:
  - arp_cache.rules

evaluation_interval: 1m

tests:
  - interval: 1m
    input_series:
      - series: 'node_arp_entries{instance="test-model_1234_test-app_test-app/0"}'
        values: '3 3 3 3 950 950 950 3 3 3'
      - series: 'node_sysctl_net_ipv4_neigh_default_gc_thresh3{instance="test-model_1234_test-app_test-app/0"}'
        values: '1024x10'
    alert_rule_test:
      - eval_time: 5m
        alertname: HostArpCache
        exp_alerts: []  # no alert
      - eval_time: 6m
        alertname: HostArpCache
        exp_alerts:
          - exp_labels:
              severity: critical
              instance: test-model_1234_test-app_test-app/0
            exp_annotations:
              summary: Host arp cache reached 93% limit (instance test-model_1234_test-app_test-app/0)
              description: >-
                Host arp cache reached 93% limit.
                  VALUE = 92.7734375
                  LABELS = map[instance:test-model_1234_test-app_test-app/0]
      - eval_time: 7m
        alertname: HostArpCache
        exp_alerts: []  # no alert

and promtool

x1:➜  prometheus_alert_rules git:(nrpe/arp_cache-aler-rules) ✗ promtool test rules ./test_arp_cache.yaml
Unit Testing:  ./test_arp_cache.yaml
  SUCCESS
                                                                                  [0.09s]

Release Notes

rgildein commented 1 year ago

I believe that this is also blocked by #202.