Title: Prometheus histogram buckets act like upper bound exclusive
Description:
I am using the metrics exposed by Envoy at the endpoint /stats/prometheus. I've encountered an issue with the latency metric being added to the histogram buckets. Specifically, when the latency matches the upper bound of a bucket, it falls into the next bucket instead.
For instance, I have configured the following histogram buckets (in milliseconds): [1, 5, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 500, 1000, 5000, 10000, 30000].
If the request latency is 24, it falls into the Bucket-25.
If the request latency is 25, it falls into the Bucket-50.
This behavior seems to indicate that the configuration is upper bound exclusive.
Is this behavior expected?
If it is upper bound exclusive, why is it labeled with le (less than or equal to)?
Is there any way to make this configuration upper bound inclusive ?
In the attached screenshot, you can see the following statistics:
19 requests are recorded at 22ms.
10 requests are recorded at 23ms.
1 request is recorded at 25ms.
But the count added as below
envoy_cluster_upstream_rq_time_bucket{le="10"} 0
envoy_cluster_upstream_rq_time_bucket{le="25"} 29
envoy_cluster_upstream_rq_time_bucket{le="50"} 30
I verified this information on the stats endpoint (as shown in the attached screenshot) and in the stats sink (also shown in the attached screenshot).
Other details:
Currently, using emissary-ingress as a wrapper on top of envoy
Emissary-Ingress version: 3.9
Envoy version: 1.27.2
Verified the same behaviour with Envoy Gateway as wrapper on top of envoy.
Envoy Gateway version: 1.1
Envoy version: 1.31.0
Repro steps:
Please generate requests with latencies that match some of the configured histogram buckets. For example, if the buckets are set as follows (in milliseconds): [1, 5, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 500, 1000, 5000, 10000, 30000], you could generate a request with a latency of 25ms.
Title: Prometheus histogram buckets act like upper bound exclusive
Description: I am using the metrics exposed by Envoy at the endpoint /stats/prometheus. I've encountered an issue with the latency metric being added to the histogram buckets. Specifically, when the latency matches the upper bound of a bucket, it falls into the next bucket instead.
For instance, I have configured the following histogram buckets (in milliseconds): [1, 5, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 500, 1000, 5000, 10000, 30000].
This behavior seems to indicate that the configuration is upper bound exclusive.
In the attached screenshot, you can see the following statistics:
But the count added as below
I verified this information on the stats endpoint (as shown in the attached screenshot) and in the stats sink (also shown in the attached screenshot).
Other details: Currently, using emissary-ingress as a wrapper on top of envoy
Verified the same behaviour with Envoy Gateway as wrapper on top of envoy.
Repro steps: Please generate requests with latencies that match some of the configured histogram buckets. For example, if the buckets are set as follows (in milliseconds): [1, 5, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 500, 1000, 5000, 10000, 30000], you could generate a request with a latency of 25ms.