ClusterLabs / ha_cluster_exporter

Prometheus exporter for Pacemaker based Linux HA clusters
Apache License 2.0
79 stars 35 forks source link

Implement SBD watchdog and msgwait metrics #174

Closed MalloZup closed 4 years ago

MalloZup commented 4 years ago

todo:

Note:

I have implemented the metric like

ha_cluster_sbd_devices{device="/dev/vdd",status="healthy"} 1
ha_cluster_sbd_devices_watchdog_timeout{device="/dev/vdc"} 60
ha_cluster_sbd_devices_msgwait_timeout{device="/dev/vdd"} 120

I didn't found any useful doc about the loop and allocate metric so I even wondered if we should expose them. (In fact I haven't but if they are meant to be useful, we should definitely document this in the suse-doc) See: https://documentation.suse.com/sle-ha/15-SP2/html/SLE-HA-all/cha-ha-storage-protect.html#sec-ha-storage-protect-watchdog-timings

@diegoakechi cc @nick-wang cc

I personally think that if we are exposing only 2 metrics we can create for each one a new metric without label.

If we want to expose 4 then I will refactor accordingly with a type label. I am ok with both directions

diegoakechi commented 4 years ago

@MalloZup How are these kind of metric being formatted on other cases or maybe on other exporters? I think the usage of label more flexible and elegant, but at the same time it might make harder for the consumer. Imagine that we use label and implement only 2 metrics. When a third label is introduced we might break dashboards that might not have consistent filters, while a new metric would be ignored.

With that my vote is for separated metrics.

stefanotorresi commented 4 years ago

@diegoakechi for these kind of use cases, the best practice is exactly the opposite of using separate metrics. See https://prometheus.io/docs/practices/instrumentation/#use-labels

diegoakechi commented 4 years ago

@diegoakechi for these kind of use cases, the best practice is exactly the opposite of using separate metrics. See https://prometheus.io/docs/practices/instrumentation/#use-labels

Sure, in that case lets go this way. Also, make sure to include all the metrics, so we avoid any filtering problem in the future. Thanks @stefanotorresi

MalloZup commented 4 years ago

@stefanotorresi I'm ok to use the type even for 2 metric.

ha_cluster_sbd_device_timeout{type="watchdog",device="/dev/vdc" 1
ha_cluster_sbd_device_timeout{type="msgwait",device="/dev/vdc"} 1

as follow-up with @yan-gao discussion we don't need to expose the other 2 metrics since they are more confusing then helping

thx for reviews. I will rework this during the day.

stefanotorresi commented 4 years ago

as follow-up with @yan-gao discussion we don't need to expose the other 2 metrics since they are more confusing then helping

okay, sounds good to me!