Closed sdmichelini closed 3 weeks ago
I'm trying to reproduce with the minimal case. I created a minimal creating histogram taking tool:
package main
import (
"net/http"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var (
my_duration = prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "my_tracing_duration_seconds",
Help: "tracing duration in seconds",
Buckets: []float64{.0, .001, .01, .05, .1, .5, 1, 2, 3, 5, 7, 10, 15, 20, 25, 30},
})
)
func init() {
prometheus.MustRegister(my_duration)
}
func main() {
http.Handle("/metrics", promhttp.Handler())
go http.ListenAndServe(":8080", nil)
simulate()
}
func simulate() {
my_duration.Observe(0.001)
my_duration.Observe(0.001)
my_duration.Observe(0.001)
my_duration.Observe(0.001)
my_duration.Observe(0.292702106)
for {
time.Sleep(1 * time.Second)
}
}
And compile it.
Then, using prometheus_scrape to scrape the served non-standard shape of bucket and specify prometheus_exporter as a output.
However, I obtained the non-broken result for this case:
# HELP my_tracing_duration_seconds tracing duration in seconds
# TYPE my_tracing_duration_seconds histogram
my_tracing_duration_seconds_bucket{le="0.0"} 0
my_tracing_duration_seconds_bucket{le="0.001"} 4
my_tracing_duration_seconds_bucket{le="0.01"} 4
my_tracing_duration_seconds_bucket{le="0.05"} 4
my_tracing_duration_seconds_bucket{le="0.1"} 4
my_tracing_duration_seconds_bucket{le="0.5"} 5
my_tracing_duration_seconds_bucket{le="1.0"} 5
my_tracing_duration_seconds_bucket{le="2.0"} 5
my_tracing_duration_seconds_bucket{le="3.0"} 5
my_tracing_duration_seconds_bucket{le="5.0"} 5
my_tracing_duration_seconds_bucket{le="7.0"} 5
my_tracing_duration_seconds_bucket{le="10.0"} 5
my_tracing_duration_seconds_bucket{le="15.0"} 5
my_tracing_duration_seconds_bucket{le="20.0"} 5
my_tracing_duration_seconds_bucket{le="25.0"} 5
my_tracing_duration_seconds_bucket{le="30.0"} 5
my_tracing_duration_seconds_bucket{le="+Inf"} 5
my_tracing_duration_seconds_sum 0.29670210600000002
my_tracing_duration_seconds_count 5
@sdmichelini Is there any further requirements to break consistency of histogram? passing-through an instance of Prometheus is needed? How to create the broken histogram which uses a custom bucket? Is that just collected from node_exporter or custom client to sent Prometheus endpoint?
All I did was expose a prometheus histogram with the buckets above as an input and I got the following output
I tried to use this Prometheus text format file:
% cat problematic_prom/histgram.prom [Fail]
# HELP my_tracing_duration_seconds tracing duration in seconds
# TYPE my_tracing_duration_seconds histogram
my_tracing_duration_seconds_bucket{le="0"} 0
my_tracing_duration_seconds_bucket{le="0.001"} 4
my_tracing_duration_seconds_bucket{le="0.01"} 4
my_tracing_duration_seconds_bucket{le="0.05"} 4
my_tracing_duration_seconds_bucket{le="0.1"} 4
my_tracing_duration_seconds_bucket{le="0.5"} 5
my_tracing_duration_seconds_bucket{le="1"} 5
my_tracing_duration_seconds_bucket{le="2"} 5
my_tracing_duration_seconds_bucket{le="3"} 5
my_tracing_duration_seconds_bucket{le="5"} 5
my_tracing_duration_seconds_bucket{le="7"} 5
my_tracing_duration_seconds_bucket{le="10"} 5
my_tracing_duration_seconds_bucket{le="15"} 5
my_tracing_duration_seconds_bucket{le="20"} 5
my_tracing_duration_seconds_bucket{le="25"} 5
my_tracing_duration_seconds_bucket{le="30"} 5
my_tracing_duration_seconds_bucket{le="+Inf"} 5
my_tracing_duration_seconds_sum 0.296702106
my_tracing_duration_seconds_count 5
And ingesting with node_exporter:
$ ./node_exporter --collector.textfile.directory=problematic_prom
Also, the internal Prometheus' metrics does not break its bucket and values:
% curl 'http://localhost:9090/api/v1/query_range?query=my_tracing_duration_seconds_bucket&start=2024-06-11T20:10:30.781Z&end=2024-06-30T20:11:00.781Z&step=3h&format=prometheus' | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2668 0 2668 0 0 3549k 0 --:--:-- --:--:-- --:--:-- 2605k
{
"status": "success",
"data": {
"resultType": "matrix",
"result": [
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "+Inf"
},
"values": [
[
1719216630.781,
"5"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "0"
},
"values": [
[
1719216630.781,
"0"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "0.001"
},
"values": [
[
1719216630.781,
"4"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "0.01"
},
"values": [
[
1719216630.781,
"4"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "0.05"
},
"values": [
[
1719216630.781,
"4"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "0.1"
},
"values": [
[
1719216630.781,
"4"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "0.5"
},
"values": [
[
1719216630.781,
"5"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "1"
},
"values": [
[
1719216630.781,
"5"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "10"
},
"values": [
[
1719216630.781,
"5"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "15"
},
"values": [
[
1719216630.781,
"5"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "2"
},
"values": [
[
1719216630.781,
"5"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "20"
},
"values": [
[
1719216630.781,
"5"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "25"
},
"values": [
[
1719216630.781,
"5"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "3"
},
"values": [
[
1719216630.781,
"5"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "30"
},
"values": [
[
1719216630.781,
"5"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "5"
},
"values": [
[
1719216630.781,
"5"
]
]
},
{
"metric": {
"__name__": "my_tracing_duration_seconds_bucket",
"instance": "localhost:9100",
"job": "node_exporter",
"le": "7"
},
"values": [
[
1719216630.781,
"5"
]
]
}
]
}
}
we cannot reproduce, changing milestone
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale
label.
This issue was closed because it has been stalled for 5 days with no activity.
Bug Report
Describe the bug When using Prometheus as a source and exporting it - the buckets on the histogram get messed up. In example below, new buckets got added and counts were missed for some of the le's in the bucket. Since counts are missed the values rendered in the
histogram_quantile
function in Grafana produce incorrect values.To Reproduce
Example Input
Example Output
Expected behavior
my_tracing_duration_seconds_count
Your Environment
prometheus_scrape
inputprometheus_exporter
output