SuperQ / smokeping_prober

Prometheus style smokeping
Apache License 2.0
575 stars 74 forks source link

No response from target causes all other targets to be polled #72

Closed lindhor closed 2 years ago

lindhor commented 2 years ago

I have noticed that if I have a number of targets to be pinged and some of these do not reply to the ping that affects the other targets that are responding. I.e. smokeping does not return correct statistics for any of the configured targets if some others are down.

Looking at the metrics returned it seems to get stuck. The total of smokeping_requests_total blocks at 2-3 failed per target. The targets that are online has a smokeping_response_duration_seconds_count of 1 or 2 and smokeping_response_ttl of -1 forever (the online ones have 0 for both).

If I remove all targets that is currently not online from the config, all works well.

If I bring all targets online without restarting smokeping, it will still not start providing correct metrics for all. Looking at the network traffic via Wireshark I see no ICMP or ARP requests being sent. Restarting smokeping and everything works well.

I would have expected that the status of one target would not affect the others. I would also expect it to start providing stats for targets when they come online.

All targets have name resolution so it seems related either to the ARP request or the ICMP request not being responded to. I run the 0.6.0 version on Windows Server 2019.

SuperQ commented 2 years ago

Interesting, I have never seen this happen myself. The probes are run independently in separate goroutines so they should not affect each other.

This may be a Windows-specific problem. I have never used this on Windows and I have no systems to test with.

lindhor commented 2 years ago

Just to try, I upgraded the go-ping dependency to the latest 1.1.0 via go get github.com\go-ping\ping and rebuilt helped. I ran with the same config and it works fine. Would you consider upgrading to that go-ping version?

SuperQ commented 2 years ago

That makes no sense. v0.6.0 already includes github.com/go-ping/ping v1.1.0.

lindhor commented 2 years ago

Sorry, I think it was my mistake. I had managed to clone the source in two different directories and picked the built binary from the wrong one. The one that I had issues with was 0.5.1 with go-ping v0.0.0-20211130115550-779d1e919534. And of course you're correct that go-ping 1.1.0 is in 0.6.0. Anyway I am not able to repeat the issue in 0.6.0 so I'm closing this.

SuperQ commented 2 years ago

Ahh, good to know, we made a few tracking fixes that sound like what you saw in the old version.

Thanks for confirming they worked.