czerwonk / ping_exporter

Prometheus exporter for ICMP echo requests using https://github.com/digineo/go-ping
MIT License
529 stars 115 forks source link

ping_loss_percent #22

Open krakazyabra opened 5 years ago

krakazyabra commented 5 years ago

Hi! Thanks for exporter! Can you be kind, please, explain how ping_loss_percent shows the value? Why is see 1 on inactive host? Shouldn't it bee 100? Why I also see value 1 on working hosts? I supposed there should be the percent value. like rate(ping_loss_percent[2m]) 100 means that in last 2 minutes 100% of packets were lost.

dmke commented 3 years ago

The value range follows Prometheus best practices, whereby percentages are expressed as values between 0 (= 0%) and 1 (= 100%).

Hence, seeing 1 for an inactive host is expected. As for why you are seeing the same value for working hosts, I cannot say. Either there is (was) a bug in the exporter or underlying ping library (unlikely, we're using this extensively), or the host running the ping_exporter binary can't reach the target host.

The alert you're trying to model would be the following:

# /path/to/prometheus/ping.rules
---
groups:
- name: Ping
  rules
  - alert: PingPacketLost
    expr: ping_loss_percent{job="ping"} = 1
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "{{ $labels.target }} ({{ $labels.ip }}) not reachable"
foogod commented 2 years ago

Arguably, this metric is actually badly named (it is not a percent at all). A better name (and more consistent with Prometheus best practices) would be to call it ping_loss_ratio instead.