Kong / kong-plugin-prometheus

Prometheus plugin for Kong - this plugin has been moved into https://github.com/Kong/kong, please open issues and PRs in that repo
Apache License 2.0
119 stars 57 forks source link

Exposed histogram data are not always consistent (concurrency issue) #42

Closed yskopets closed 4 years ago

yskopets commented 5 years ago

Summary

According to Prometheus Exposition Format, a single Histogram is represented by multiple time series (1 time series per bucket).

The values of those time series must be consistent (every next bucket must have a greater or equal value).

Apparently, Kong Prometheus Plugin occasionally breaks this requirement.

Presumably, it is a concurrency issue (histogram is being updated while scraping is in progress).

Steps to reproduce

  1. Configure Prometheus Plugin (e.g., curl -X POST http://localhost:8001/plugins --data "name=prometheus')
  2. Create a concurrent load on Kong Proxy (e.g., ab -t 60 -c 20 -H 'Host: www.example.org' http://localhost:8000/)
  3. Scrape metrics while load test is in progress (e.g., curl http://localhost:8001/metrics)

Occasionally, returned values are inconsistent, e.g.

kong_latency_bucket{type="upstream",service="kong-demo-1.nginx.80",le="00001.0"} 569
kong_latency_bucket{type="upstream",service="kong-demo-1.nginx.80",le="00002.0"} 2221
...
kong_latency_bucket{type="upstream",service="kong-demo-1.nginx.80",le="00300.0"} 195204
kong_latency_bucket{type="upstream",service="kong-demo-1.nginx.80",le="00400.0"} 195228
kong_latency_bucket{type="upstream",service="kong-demo-1.nginx.80",le="00500.0"} 195230
kong_latency_bucket{type="upstream",service="kong-demo-1.nginx.80",le="01000.0"} 195230
kong_latency_bucket{type="upstream",service="kong-demo-1.nginx.80",le="02000.0"} 195230
kong_latency_bucket{type="upstream",service="kong-demo-1.nginx.80",le="05000.0"} 195230
kong_latency_bucket{type="upstream",service="kong-demo-1.nginx.80",le="10000.0"} 195230
kong_latency_bucket{type="upstream",service="kong-demo-1.nginx.80",le="30000.0"} 195229
kong_latency_bucket{type="upstream",service="kong-demo-1.nginx.80",le="60000.0"} 195229
kong_latency_bucket{type="upstream",service="kong-demo-1.nginx.80",le="+Inf"} 195230
kong_latency_count{type="upstream",service="kong-demo-1.nginx.80"} 195230
kong_latency_sum{type="upstream",service="kong-demo-1.nginx.80"} 4482078

Notice that a value for buckets le="30000.0" and le="60000.0" is lower than for bucket le="10000.0.

hbagdi commented 4 years ago

This shouldn't be the case anymore with recent performance improvements that fffonion has put in. Thanks for the report.