Open SuperQ opened 8 years ago
@SuperQ thank you -- can you please elaborate? I'm actually not familiar with promotheus. What would be the benefits to gh-ost
directly reporting from the app?
Actually, I'm having this exact same discussion on orchestrator
. It supports graphite
, but will now need to support datadog
. And then maybe something else. I'm wondering whether the tool should explicitly support these, or maybe it should export metrics such that external monitoring tools read them.
Any time the internal state of the app is updated, for example starting, pausing, errors, copy progress, update counters and the Prometheus library allows these to be polled from the outside.
Then things can be graphed and displayed, or alerts written to discover stalled gh-ost
jobs.
The Prometheus output format is very generic and can be used to feed other systems. It's not done yet, but we're working on splitting the prometheus library into the counter parts and exposition parts in order to make this easier. (Our golang counter code is very well optimized)
Any time the internal state of the app is updated, for example starting, pausing, errors, copy progress, update counters and the Prometheus library allows these to be polled from the outside.
To clarify, gh-ost
would need to proactively tell promotheus about any state change, much like it would write to graphite
. The "polled from the outside" is unrelated to gh-ost
. Is that correct?
No, Prometheus is a polling-type metrics collector (think SNMP, but way better). gh-ost
simply keeps track of the counters and Prometheus polls it. I'll look over the current code and see what needs to be done.
Recently implemented in orchestrator
, I'd use go-metrics/exp
to export in expvar
-like format via /debug/metrics
expvar
is not prometheus compatible. :-/
It would be good to integrate Prometheus client library for monitoring of long-running jobs.