Symantec / Dominator

The Dominator Config Management and Image Deployment System
Apache License 2.0
108 stars 19 forks source link

Add support for Prometheus metrics #432

Open SuperQ opened 6 years ago

SuperQ commented 6 years ago

It would be nice if the various services exposed a Prometheus /metrics endpoint. This would allow for easy monitoring and alerting, especially useful for Kubernetes users, who very often use Prometheus.

As the project is in Go, we have an efficient library for instrumenting code.

rgooch commented 6 years ago

I'm not really keen on that. I consider the tricorder/Scotty ecosystem a better solution (it supports more than floats and has a better UI for humans to discover and explore the available metrics for an application), and it follows the vision of more integrations in the metrics space (like integrating with the health-agent for health checking during rollouts). Also, I don't want to add another code dependency and bloat out the system. Note also that the tricorder library creates a /metrics endpoint already.

What might be a good alternative is to add the ability to push metrics from Scotty to Prometheus. The advantage of this approach is that none of the Dominator ecosystem components need to be changed.

Please take a look: https://github.com/Symantec/scotty https://docs.google.com/document/d/e/2PACX-1vQPhkHYiLK7aKHLECa9EFtSCBSPK-obgGB8C66d72Kl-ej9NikRKYGsuFj1R9aDTlGiZA7OmXrVw8P3/pub

SuperQ commented 6 years ago

You may be interested in the OpenMetrics format that is being worked on. We are adding support for uint64 there, which covers a couple of use cases where float64 isn't sufficient.

SuperQ commented 6 years ago

You may also be interested in our TSDB library as it's more than 15x as memory and storage efficient than the TSDB proposed in the scotty design doc. The key trick is that it separates sample values from metric indexing. This allows each 1k series block to contain only samples and timestamps, in a double-delta encoded compressed format. This reduces the per-sample space requirements to less than 1.1 bytes per sample (float64+uint64 epoch ms).

There's a good talk on the design from PromCon last year: https://promcon.io/2017-munich/talks/storing-16-bytes-at-scale/