Fix Hash Value For Prometheus Histogram Metrics.

Stackdriver / stackdriver-prometheus-sidecar

A sidecar for the Prometheus server that can send metrics to Stackdriver.

https://cloud.google.com/monitoring/kubernetes-engine/prometheus

Apache License 2.0

120 stars 43 forks source link

Fix Hash Value For Prometheus Histogram Metrics. #175

Closed StevenYCChou closed 4 years ago

StevenYCChou commented 4 years ago

Through internal investigation, I observed that one of the queue length is consistently high and blocked the whole queue manager. I identified that there are large volume of points with hash value 0 tried to be enqueued on to shard #0, which makes the sharding inbalance and the sharded queue are constantly blocked by shard #0.

When Stackdriver Prometheus Sidecar tries to transform Prometheus histogram metrics into Stackdriver timeseries with value type distribution, it should return the hash values derived by series cache rather than 0. This change fixes the undesired behavior.

bmoyles0117 commented 4 years ago

This is great! Does this potentially apply to all of the time series that return a nil error? Specifically other calls to getResetAdjusted raise a potential concern in my mind.

StevenYCChou commented 4 years ago

This is great! Does this potentially apply to all of the time series that return a nil error? Specifically other calls to getResetAdjusted raise a potential concern in my mind.

My understanding is that if sampleBulder.next() returns both outSample as nil and error as nil, the caller PrometheusReader.Run() won't append the (Stackdriver) sample into queue. All returned value related to getResetAdjusted satisfies this condition.

StevenYCChou commented 4 years ago

Addressed the comments, please take another look. Thanks!