Open krakazyabra opened 5 years ago
The value range follows Prometheus best practices, whereby percentages are expressed as values between 0 (= 0%) and 1 (= 100%).
Hence, seeing 1 for an inactive host is expected. As for why you are seeing the same value for working hosts, I cannot say. Either there is (was) a bug in the exporter or underlying ping library (unlikely, we're using this extensively), or the host running the ping_exporter
binary can't reach the target host.
The alert you're trying to model would be the following:
# /path/to/prometheus/ping.rules
---
groups:
- name: Ping
rules
- alert: PingPacketLost
expr: ping_loss_percent{job="ping"} = 1
for: 2m
labels:
severity: critical
annotations:
summary: "{{ $labels.target }} ({{ $labels.ip }}) not reachable"
Arguably, this metric is actually badly named (it is not a percent at all). A better name (and more consistent with Prometheus best practices) would be to call it ping_loss_ratio
instead.
Hi! Thanks for exporter! Can you be kind, please, explain how
ping_loss_percent
shows the value? Why is see1
on inactive host? Shouldn't it bee100
? Why I also see value1
on working hosts? I supposed there should be the percent value. likerate(ping_loss_percent[2m]) 100
means that in last 2 minutes 100% of packets were lost.