SuperQ / smokeping_prober

Prometheus style smokeping
Apache License 2.0
554 stars 73 forks source link

Rework dashboard #151

Open SuperQ opened 4 months ago

SuperQ commented 4 months ago

Rework the dashbaord to be more useful.

SuperQ commented 4 months ago

Fixes: https://github.com/SuperQ/smokeping_prober/issues/150

SuperQ commented 4 months ago

Fixes: https://github.com/SuperQ/smokeping_prober/issues/100

SuperQ commented 4 months ago

Fixes: https://github.com/SuperQ/smokeping_prober/issues/90

dominikh commented 4 months ago

Add support for native histograms.

This seems to be breaking the dashboard for people who aren't using native histograms. I'm getting this for the Average Latency graph:

Status: 500. Message: bad_data: invalid parameter "query": 1:1: parse error: unknown function with name "histogram_avg"

The new dashboard doesn't seem to break out multiple ping targets into their own panels anymore. This was useful to compare hosts and check if they behaved differently, e.g. due to routing. Being able to look at the sum of all hosts (by setting host and ip to all) is definitely useful, though.

And I can see how breaking them out would be bad if someone had dozens of targets. I'm not well-versed in Grafana; is there a way to add a checkbox that toggles this behavior?

SuperQ commented 4 months ago

What version of Prometheus do you have?

SuperQ commented 4 months ago

I can add the row configuration back in.

dominikh commented 4 months ago

What version of Prometheus do you have?

I'm on version 2.47.2. histogram_avg seems to have been added in 2.51.0, which only released in March 2024. Even then, the function is documented as

This function only acts on native histograms, which are an experimental feature.

and most users probably have their data in classic histograms, not native ones.

SuperQ commented 4 months ago

Yes, and that's why there is an or in the query now. If the native histogram doesn't return data, it will use the classic histogram data.

bboehmke commented 3 months ago

Hi,

I also checked the reworked dashboard and it looks like the or is missing in the 3rd panel Average Latency. This results in no data for if native histograms are not enabled.