Closed oxedions closed 3 years ago
hi @oxedions nice question and valid remarks :+1:
1) by design the ha_cluster_exporter
if pacemaker or any collector is down, it will not gather and expose the metrics.
we raise/error only if 0 collector are registered. (@stefanotorresi maybe we could think to don't raise an error if we have 0 collectors and just do nothing)
2) regarding point 2 so basically if you have other components before the pacemaker collector is running, you should be able to do it.
Also in case if node 1 for some reason, right now the 2nd exporter can catch this
Let me know if it helps.
Dario
@oxedions The exporter itself will report a metric exposing with the collector could collect the data or not: https://github.com/ClusterLabs/ha_cluster_exporter/blob/master/doc/metrics.md#ha_cluster_scrape_success. Also, if Prometheus cannot scrap some target, like the exporter is not reachable, it will also report a metric called UP (see: https://prometheus.io/docs/concepts/jobs_instances/). On both cases, you can create alerts based on these metrics.
(@stefanotorresi maybe we could think to don't raise an error if we have 0 collectors and just do nothing)
This is how it was before, and we deemed it was very unpredictable as a behaviour. If there are no services to introspect, then the exporter will fail outright so that the up
metric reports the failure; there is no point in having the exporter still running without exporting anything.
Dear @MalloZup , Dear @diegoakechi , Dear @MalloZup ,
So I can monitor ha_cluster_scrape_success
, this is the value I was looking for.
By default, I start the ha_exporter at boot, and not corrosync/pacemaker, which is why I forked your service file.
I am doing that because the exporter, in our configuration, is expected to run anytime, even if nothing is exported (having nothing exported, i.e. ha_cluster_scrape_success = 0
but ha_cluster = up
is also an interesting value for us: means HA cluster is down for a reason, but exporter is still alive, so no need to worry about the exporter, only check HA).
Many thanks for these answers 😊 And many thanks for the exporter and the dashboard.
Well, in fact we do start the ha_exporter in a clone resource now.
I guess I can close this then. :v:
Yes, thanks a lot ! 😊
Hi !
First, thanks a lot for this HA exporter. We are now using it as a base exporter for our HA. 😊
I had a question regarding the fact that you start the exporter with dependency to pacemaker in your service file.
https://github.com/ClusterLabs/ha_cluster_exporter/blob/master/ha_cluster_exporter.service
This brings 2 questions on my side :
What happen when pacemaker is down ? What is the output then of the exporter ? I am asking this because we are using this exporter to grab metrics, but we also wish to fire alerts when something is failing / down.
Is this dependency to pacemaker needed in the service file ? For example, I may like to start the exporter before starting the HA cluster. By default on my cluster, HA cluster is not enabled (so not starting at boot) since I prefer to have it down after a crash instead of an automatic restart without investigations. This dependency is preventing me to do that.
With my best regards
Ox