cncf / demo

Demo of CNCF technologies
https://cncf.io
Apache License 2.0
77 stars 39 forks source link

Prometheus resource usage "observer effect" #196

Closed namliz closed 7 years ago

namliz commented 7 years ago

Prometheus has been performing admirably with the demo for a while now - the amount of points written to it is relatively small despite the varied workload. So this was as expected.

Turns out according to prometheus/prometheus#455 the amount of resources it uses is not just bound by how much is written into it but also how much it is queried. For instance, if you open Grafana in a dozen tabs you can see memory starting to climb (it's a heavy dashboard).

The demo also recently added a sidecar that logs info from Prometheus to a cncfdemo backend, this increased the amount of resources used -- obvious in retrospect.

Finally, until now prometheus was just deployed as a regular pod in a 'monitoring' namespace, so it would end up on a random node. Including memory constrained nodes (the demo overloads some nodes by design). This causes a sort of observer effect and occasionally skews the results in a pronounced and strange way.

The obvious conclusion is to pin Prometheus and other crucial infra pods to some reserved nodes with plenty of headroom.