SuperQ / smokeping_prober

Prometheus style smokeping
Apache License 2.0
556 stars 73 forks source link

Heatmap is empty for all hosts #100

Open Strykar opened 1 year ago

Strykar commented 1 year ago

The Packet Loss and Latency charts plot fine. Screenshot: https://imgur.com/a/QlZJCnS

Invoked via systemd as:

smokeping_prober --privileged --config.file=/path/to/smokeping_prober.yaml --web.listen-address=:9374 --web.telemetry-path="/metrics"

Also tried with:

--buckets="5e-05,0.0001,0.0002,0.0004,0.0008,0.0016,0.0032,0.0064,0.0128,0.0256,0.0512,0.1024,0.2048,0.4096,0.8192,1.6384,3.2768,6.5536,13.1072,26.2144" 

Logs show no error:

May 03 17:48:28 graf smokeping_prober[943298]: ts=2023-05-04T00:48:28.264Z caller=main.go:202 level=info msg="Starting prober" address=138.199.4.164 interval=1s size_bytes=56
May 03 17:48:28 graf smokeping_prober[943298]: ts=2023-05-04T00:48:28.291Z caller=main.go:220 level=info msg="Listening on" address=:9374
May 03 17:48:28 graf smokeping_prober[943298]: ts=2023-05-04T00:48:28.291Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false
lines 1-21/21 (END)

Config:

targets:
- hosts:
  - 89.187.177.134  # NYC
  - xxx 30 more hosts  # NYC
  interval: 1s
  network: ip
  protocol: icmp
  size: 56

System info:

$ go version
go version go1.15.15 linux/amd64
$ uname -a
Linux graf 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
carlosrodfern commented 1 year ago

See the source ip in the smokeping_prober.yaml file. It may be struggling with the routing because of it. Try commenting it out.

#  source: 127.0.1.1 # Souce IP address to use. Default: None (automatic selection)
ext4xfs commented 9 months ago

I'm seeing the same issue, it doesn't seem like adding a source matters here. I'm using docker and the following yaml.

targets:
- hosts:
  - 8.8.8.8
  - 1.1.1.1
  interval: 1s # Duration, Default 1s.
  network: ip4 # One of ip, ip4, ip6. Default: ip (automatic IPv4/IPv6)
  protocol: icmp # One of icmp, udp. Default: icmp (Requires privileged operation)
  size: 56 # Packet data size in bytes. Default 56 (Range: 24 - 65535)
  # source:  # Souce IP address to use. Default: None (automatic selection)

Prometheus grabbing metrics:

# HELP smokeping_requests_total Number of ping requests sent
# TYPE smokeping_requests_total counter
smokeping_requests_total{host="1.1.1.1",ip="1.1.1.1",source=""} 38
smokeping_requests_total{host="8.8.8.8",ip="8.8.8.8",source=""} 38
# HELP smokeping_response_duplicates_total The number of duplicated response packets.
# TYPE smokeping_response_duplicates_total counter
smokeping_response_duplicates_total{host="1.1.1.1",ip="1.1.1.1",source=""} 0
smokeping_response_duplicates_total{host="8.8.8.8",ip="8.8.8.8",source=""} 0
# HELP smokeping_response_duration_seconds A histogram of latencies for ping responses.
# TYPE smokeping_response_duration_seconds histogram
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="5e-05"} 0
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.0001"} 0
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.0002"} 0
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.0004"} 0
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.0008"} 0
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.0016"} 0
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.0032"} 4
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.0064"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.0128"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.0256"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.0512"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.1024"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.2048"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.4096"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="0.8192"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="1.6384"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="3.2768"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="6.5536"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="13.1072"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="26.2144"} 37
smokeping_response_duration_seconds_bucket{host="1.1.1.1",ip="1.1.1.1",source="",le="+Inf"} 37
smokeping_response_duration_seconds_sum{host="1.1.1.1",ip="1.1.1.1",source=""} 0.13052124300000004
smokeping_response_duration_seconds_count{host="1.1.1.1",ip="1.1.1.1",source=""} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="5e-05"} 0
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.0001"} 0
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.0002"} 0
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.0004"} 0
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.0008"} 0
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.0016"} 0
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.0032"} 17
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.0064"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.0128"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.0256"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.0512"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.1024"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.2048"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.4096"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="0.8192"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="1.6384"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="3.2768"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="6.5536"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="13.1072"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="26.2144"} 37
smokeping_response_duration_seconds_bucket{host="8.8.8.8",ip="8.8.8.8",source="",le="+Inf"} 37
smokeping_response_duration_seconds_sum{host="8.8.8.8",ip="8.8.8.8",source=""} 0.11810808400000002
smokeping_response_duration_seconds_count{host="8.8.8.8",ip="8.8.8.8",source=""} 37
# HELP smokeping_response_ttl The last response Time To Live (TTL).
# TYPE smokeping_response_ttl gauge
smokeping_response_ttl{host="1.1.1.1",ip="1.1.1.1",source=""} 56
smokeping_response_ttl{host="8.8.8.8",ip="8.8.8.8",source=""} 117
# HELP smokeping_send_errors_total The number of errors when Pinger attempts to send packets.
# TYPE smokeping_send_errors_total counter
smokeping_send_errors_total{host="1.1.1.1",ip="1.1.1.1",source=""} 0
smokeping_send_errors_total{host="8.8.8.8",ip="8.8.8.8",source=""} 0

I'm not sure what is wrong the query for the heatmap based on this info.

Nachtfalkeaw commented 8 months ago

Did you try to use the dashboard.json from this repository? Works for me out of the box.

The heatmap query looks like this. I modified it a little bit but the original from repository works and this one, too.

sum(rate(smokeping_response_duration_seconds_bucket{host=~"$target"}[1m])) by (le)

the rate is [1m] so you need to scrape the smokeping_prober at least every 30s from prometheus to get results.

ext4xfs commented 8 months ago

@Nachtfalkeaw, yep that's the one. Neither your query nor the original works for me at least. Pings are at 1s interval, switched to 15s just for fun and it is the same as expected.

ext4xfs commented 8 months ago

it works if it's 2m instead of 1m, not sure why... what am I missing

ext4xfs commented 8 months ago

My scrape interval was too low in prometheus (as you mentioned) (new to prometheus)

weixiaolv commented 6 months ago

I just meet the same promble while try to start smokeping_prober by nologin user. Run it by root ok