free / prometheus

The Prometheus monitoring system and time series database.
https://prometheus.io/
Apache License 2.0
48 stars 4 forks source link

xrate not good enough #18

Open baryluk opened 1 year ago

baryluk commented 1 year ago

prometheus-2.39.1+0.0.3.linux-amd64

Some improvements, but this still needs to be fixed:

image

Scrape interval is 5s

free commented 1 year ago

It is not entirely clear what you mean. Are you referring to the jittery ramp-up? The few small jitters later on? The fact that there exists a ramp-up at all?

baryluk commented 1 year ago

Hi. Yes, the ramp-up.

I would prefer (in most situations, especially for graphing) to just use less points, than interpreting rate 5m as average over 5 minutes, and no data meaning 0, but rather, to mean smooth in 5m windows, but ignore non-existing points.

I know that with such a definition, variance at the start of timeseries might be higher, so maybe have a minimum threshold (1/4 of time range), or something.

It is a bit weird imho with current semantic.

free commented 1 year ago

Consider that increase() is essentially implemented as rate() * interval. With your proposed approach, a metric that appears then disappears after a few minutes and increments by 1 per second would produce a total increase of 86400 over 1d,

Similarly, if you compute the rate over multiple hours of short-lived metrics, it would look as if they were increasing at their peak rate for the whole duration (e.g. a metric that appears and disappears after 5 minutes would result in a 65 minute line if you used rate(metric[1h]) on it). Another way or looking at it is that one would expect the area under a chart to be proportional to the increase of the counter(the rate is the derivative of the counter; the counter is the integral of the rate). The 65 minute constant rate for a 5-minute counter hugely overestimates the counter increase.

Plus, there's the potentially huge variance when the metric appears/disappears.

It's not that one approach is strictly better than the other. But depending on exactly what one wants, one approach might be preferable to the other. And the truth is that switching the current behavior for a new one is likely to surprise/annoy more people and break more alerts/dashboards than retaining it.