canonical / grafana-agent-k8s-operator

https://charmhub.io/grafana-agent-k8s
Apache License 2.0
8 stars 18 forks source link

feat: Add prometheus alert rules for nvme #206

Closed jneo8 closed 1 year ago

jneo8 commented 1 year ago

Context

Add nvme prometheus alert rules

Testing Instructions

Tested with

rule_files:
  - ./nvme.rule
evaluation_interval: 1m
tests:
  - interval: 1m
    input_series:
      - series: node_hwmon_temp_alarm{chip="nvme_nvme1", sensor="temp1"}
        values: 1
    alert_rule_test:
      - eval_time: 1m
        alertname: NvmeHwmonTempAlarm
        exp_alerts:
          - exp_labels:
              alertname: NvmeHwmonTempAlarm
              chip: "nvme_nvme1"
              severity: warning
              sensor: "temp1"
            exp_annotations:
              summary: Chip nvme_nvme1 throw a temperature alarm 1
  - interval: 1m
    input_series:
      - series: node_filesystem_avail_bytes{device="/dev/nvme1n1p2",fstype="ext4",mountpoint="/"}
        values: 19
      - series: node_filesystem_size_bytes{device="/dev/nvme1n1p2",fstype="ext4",mountpoint="/"}
        values: 100
    alert_rule_test:
      - eval_time: 5m
        alertname: FileSystemPercentUsedWarn
        exp_alerts:
          - exp_labels:
              alertname: FileSystemPercentUsedWarn
              severity: warning
              device: "/dev/nvme1n1p2"
              fstype: "ext4"
              mountpoint: "/"
            exp_annotations:
              summary: Available disk on / is too low
              description: Available disk percentage on mountpoint(/) 19 is < 20%
  - interval: 1m
    input_series:
      - series: node_filesystem_avail_bytes{device="/dev/nvme1n1p2",fstype="ext4",mountpoint="/boot/efi"}
        values: 5
      - series: node_filesystem_size_bytes{device="/dev/nvme1n1p2",fstype="ext4",mountpoint="/boot/efi"}
        values: 100
    alert_rule_test:
      - eval_time: 5m
        alertname: FileSystemPercentUsedCrit
        exp_alerts:
          - exp_labels:
              alertname: FileSystemPercentUsedWarn
              severity: critical
              device: "/dev/nvme1n1p2"
              fstype: "ext4"
              mountpoint: "/boot/efi"
            exp_annotations:
              summary: Available disk on /boot/efi is too low
              description: Available disk percentage on mountpoint(/boot/efi) 5 is < 10%

Release Notes

jneo8 commented 1 year ago

I am gone to close this PR and create another two PRs, one for hwmon and another one for file system. Because the metrics is not that relate to nvme, it only relate if the chip is nvme.