SuperQ / smokeping_prober

Prometheus style smokeping
Apache License 2.0
556 stars 73 forks source link

smokeping_response_duration_seconds_bucket stop growing #98

Closed xrayou closed 1 year ago

xrayou commented 1 year ago

After running for a period of time, one of the ip's "smokeping_response_duration_seconds_bucket" metrics has stopped growing, but it can ping this IP on the host.

We used it to detect internet connectivity, this problem has occurred several times on different hosts, and the server load is ok.

image image

stephanep commented 1 year ago

I have that issue too (using the Debian package prometheus-smokeping-prober 0.4.1-2+b5 amd64). Bucket values, as well as sum and count suddenly and unexpectedly stops increasing. In my case this happens close to 65535, so suspecting an integer overflow.

For example:

smokeping_response_duration_seconds_bucket{host="192.168.0.1",ip="192.168.0.1",le="0.02"} 0
smokeping_response_duration_seconds_bucket{host="192.168.0.1",ip="192.168.0.1",le="0.04"} 61344
smokeping_response_duration_seconds_bucket{host="192.168.0.1",ip="192.168.0.1",le="0.06"} 62885
smokeping_response_duration_seconds_bucket{host="192.168.0.1",ip="192.168.0.1",le="0.08"} 63863
smokeping_response_duration_seconds_bucket{host="192.168.0.1",ip="192.168.0.1",le="0.1"} 64474
smokeping_response_duration_seconds_bucket{host="192.168.0.1",ip="192.168.0.1",le="0.12"} 64894
smokeping_response_duration_seconds_bucket{host="192.168.0.1",ip="192.168.0.1",le="0.14"} 65184
smokeping_response_duration_seconds_bucket{host="192.168.0.1",ip="192.168.0.1",le="0.16"} 65378
smokeping_response_duration_seconds_bucket{host="192.168.0.1",ip="192.168.0.1",le="0.18"} 65461
smokeping_response_duration_seconds_bucket{host="192.168.0.1",ip="192.168.0.1",le="0.2"} 65467
smokeping_response_duration_seconds_bucket{host="192.168.0.1",ip="192.168.0.1",le="+Inf"} 65523
smokeping_response_duration_seconds_sum{host="192.168.0.1",ip="192.168.0.1"} 2267.4780574499555
smokeping_response_duration_seconds_count{host="192.168.0.1",ip="192.168.0.1"} 65523

The unit keeps running and observably keeps sending pings and getting replies. No log events.

stephanep commented 1 year ago

Ok so regarding my own case, i found the version that i had on my system (from Debian 11) would reproduce at each attempt, but it appears to be fixed in more recent version (built from the master branch of this repo).

stephane@v-debian:~$ prometheus-smokeping-prober --web.listen-address=127.0.0.1:9374 --buckets=0.02,0.04,0.06,0.08,0.1,0.12,0.14,0.16,0.18,0.2 --ping.interval=1ms 192.168.88.1 &

stephane@v-debian:~$ curl http://127.0.0.1:9374/metrics 2>/dev/null | grep -E 'smokeping_response_duration_seconds|duplicate' | grep -Ev '^#' smokeping_response_duplicates_total{host="192.168.88.1",ip="192.168.88.1"} 269999 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",le="0.02"} 65535 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",le="0.04"} 65535 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",le="0.06"} 65535 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",le="0.08"} 65535 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",le="0.1"} 65535 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",le="0.12"} 65535 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",le="0.14"} 65535 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",le="0.16"} 65535 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",le="0.18"} 65535 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",le="0.2"} 65535 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",le="+Inf"} 65535 smokeping_response_duration_seconds_sum{host="192.168.88.1",ip="192.168.88.1"} 13.415138290999874 smokeping_response_duration_seconds_count{host="192.168.88.1",ip="192.168.88.1"} 65535


(buckets stops growing and smokeping_response_duplicates_total starts increasing instead after 65535 packets)

- Not reproducing:

stephane@v-debian:~$ git clone 'https://github.com/SuperQ/smokeping_prober.git' && cd smokeping_prober stephane@v-debian:~$ make stephane@v-debian:~/smokeping_prober$ ./smokeping_prober --version smokeping_prober, version 0.6.1 (branch: master, revision: b0bd6b6222489e4e1dd7a75077ecc2615ce95ee6) build user: stephane@v-debian build date: 20230509-22:25:29 go version: go1.19.8 platform: linux/amd64 tags: netgo static_build

stephane@v-debian:~/smokeping_prober$ ./smokeping_prober --web.listen-address=127.0.0.1:9374 --buckets=0.02,0.04,0.06,0.08,0.1,0.12,0.14,0.16,0.18,0.2 --ping.interval=1ms 192.168.88.1 &

stephane@v-debian:~/smokeping_prober$ curl http://127.0.0.1:9374/metrics 2>/dev/null | grep -E 'smokeping_response_duration_seconds|duplicate' | grep -Ev '^#' smokeping_response_duplicates_total{host="192.168.88.1",ip="192.168.88.1",source=""} 0 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",source="",le="0.02"} 179106 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",source="",le="0.04"} 179107 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",source="",le="0.06"} 179107 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",source="",le="0.08"} 179107 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",source="",le="0.1"} 179107 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",source="",le="0.12"} 179107 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",source="",le="0.14"} 179107 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",source="",le="0.16"} 179107 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",source="",le="0.18"} 179107 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",source="",le="0.2"} 179107 smokeping_response_duration_seconds_bucket{host="192.168.88.1",ip="192.168.88.1",source="",le="+Inf"} 179107 smokeping_response_duration_seconds_sum{host="192.168.88.1",ip="192.168.88.1",source=""} 41.917291788000156 smokeping_response_duration_seconds_count{host="192.168.88.1",ip="192.168.88.1",source=""} 179107



However, i do not think that the OP has the same issue, because the bucket values increase above 65535. @xrayou what version of smokeping_prober are you using? Did you build from the repo or used a distro package? Are you seeing smokeping_response_duplicates_total increasing?
SuperQ commented 1 year ago

This appears to be a Debian bug, please avoid using Debian packages due to bugs introduced by Debian's ignoring of Go modules for dependencies.

SuperQ commented 1 year ago

Duplicate of https://github.com/SuperQ/smokeping_prober/issues/51

stephanep commented 1 year ago

Good catch, i'll see if the Debian package can be fixed. Not sure however the OP issue is the same, they are not mentioning using Debian packages and the values are going above 65535, so i believe it is likely a different thing.