ClusterLabs / ha_cluster_exporter

Prometheus exporter for Pacemaker based Linux HA clusters
Apache License 2.0
79 stars 35 forks source link

Include Cluster Attributes Metrics #162

Closed diegoakechi closed 4 years ago

diegoakechi commented 4 years ago

Some resource agents uses the Cluster Attributes for communication between nodes and also to store some internal metadata that are used in order to make decisions. The ha_cluster_exporter should expose these attributes as it can compose better alerting and troubleshooting when the cluster take some actions. On example is the SAPHanaSR resource agents that uses the below attributes:

Node Attributes:
* Node hana01:
    + hana_prd_clone_state              : PROMOTED  
    + hana_prd_op_mode                  : logreplay 
    + hana_prd_remoteHost               : hana02    
    + hana_prd_roles                    : 4:P:master1:master:worker:master
    + hana_prd_site                     : PRIMARY_SITE_NAME
    + hana_prd_srmode                   : sync
    + hana_prd_sync_state               : PRIM
    + hana_prd_version                  : 2.00.040.00.1553674765
    + hana_prd_vhost                    : hana01    
    + lpa_prd_lpt                       : 1592233031
    + master-rsc_SAPHana_PRD_HDB00      : 150
* Node hana02:
    + hana_prd_clone_state              : DEMOTED   
    + hana_prd_op_mode                  : logreplay 
    + hana_prd_remoteHost               : hana01    
    + hana_prd_roles                    : 4:S:master1:master:worker:master
    + hana_prd_site                     : SECONDARY_SITE_NAME
    + hana_prd_srmode                   : sync
    + hana_prd_sync_state               : SFAIL
    + hana_prd_version                  : 2.00.040.00.1553674765
    + hana_prd_vhost                    : hana02    
    + lpa_prd_lpt                       : 10
    + master-rsc_SAPHana_PRD_HDB00      : -INFINITY 

Depending on how the resource agent sets some of these values (like the roles), the resource agent will trigger failover, or will stop a possible failover, etc.