SUSE / sap_host_exporter

Prometheus exporter for SAP systems (SAP S/4 HANA and SAP Netweaver hosts)
Apache License 2.0
34 stars 13 forks source link

Collected metric was collected before with the same name and label values #57

Closed dgarcdu closed 2 years ago

dgarcdu commented 3 years ago

Hello. I have recently deployed this exporter on several instances, and in some cases I cannot get any metrics. The exporters start and run without any apparent problems, but when trying to see what are they exposing via curl, I get the following:

curl localhost:9680/metrics
An error has occurred while serving metrics:

collected metric "sap_start_service_instances" { label:<name:"SID" value:"TGR" > label:<name:"features" value:"MESSAGESERVER|ENQUE" > label:<name:"instance_hostname" value:"saptgrcs" > label:<name:"instance_name" value:"ASCS01" > label:<name:"instance_number" value:"1" > label:<name:"start_priority" value:"1" > gauge:<value:2 > } was collected before with the same name and label values

The config file:

# The listening TCP/IP address and port.
address: "0.0.0.0"
port: "9680"

# The log level.
#
# Possible values, from less to most verbose: error, warn, info, debug.
log-level: "info"

# The path to a Unix Domain Socket to access SAPControl locally.
#
# This is usually /tmp/.sapstream5<instance number>13
#
# If this is specified, sap-control-url setting will be ignored.
# UDS connection doesn't require authentication
sap-control-uds: "/tmp/.sapstream50113"

In another instance:

curl http://localhost:9680/metrics
An error has occurred while serving metrics:

collected metric "sap_start_service_instances" { label:<name:"SID" value:"TGR" > label:<name:"features" value:"ENQREP" > label:<name:"instance_hostname" value:"saptgrers" > label:<name:"instance_name" value:"ERS02" > label:<name:"instance_number" value:"2" > label:<name:"start_priority" value:"0.5" > gauge:<value:1 > } was collected before with the same name and label values

And the corresponding config:

# The listening TCP/IP address and port.
address: "0.0.0.0"
port: "9680"

# The log level.
#
# Possible values, from less to most verbose: error, warn, info, debug.
log-level: "info"

# The path to a Unix Domain Socket to access SAPControl locally.
#
# This is usually /tmp/.sapstream5<instance number>13
#
# If this is specified, sap-control-url setting will be ignored.
# UDS connection doesn't require authentication
sap-control-uds: "/tmp/.sapstream50213"

In any other instances, the exporters are running fine and serving metrics without any problem. It is just in these two I'm getting the error messages.

AFAIK that error message usually appears when there are duplicated metrics at the source, but I do not know much about SAP systems and maybe I'm missing something here, so any help would be very appreciated.

Thanks in advance.

Hummdis commented 2 years ago

Hey @dgarcdu , I'm having the same problem. We have the SAP Host Exporter running on over 100 systems without issue, many have 2+ exporters for each SAP instance on the same host using different service files, config files, and so on. However, on just 10 systems I get this error message, but it's for different SAP instance IDs and different ports.

Did you ever figure this out?

Cheers!

dgarcdu commented 2 years ago

Hi @Hummdis

I was not directly involved in the solution, but I'll explain as best as I can.

We have a ASCS/ERS cluster with 2 VMs, hosting each of them an ASCS and an ERS instance. These VMs are in different GCE zones.

We installed the SAP host exporter in each of the VMs, and the only ones with the issue were those that were part of the ASCS/ERS cluster. Exporters in HANA instances,did not show this problem.

Apparently, there was something misconfigured, and our SAP team reached the conclusion that the problem was that both cluster nodes were mounting /usr/sap/ permanently. This is a broken cluster setup resulting in dual-reported SAP instances. Also the fstab showed filesystems mounted statically. This is not supported, because it leads to the observed phantom GRAY instances.

> 192.168.8.4:/grc-pro-trans           /usr/sap/trans  nfs   nofail,noauto  0  0
> 192.168.8.4:/grc-pro-software            /software       nfs   nofail,noauto  0  0
> 192.168.8.4:/grc-pro-sapdata             /usr/sap/PGR    nfs   nofail,noauto  0  0
> 192.168.8.4:/grc-pro-sapmnt              /sapmnt         nfs   nofail,noauto  0  0 

I have not details on which solution our SAP team finally used, but the options they evaluated were either add Filesystem resources to the cluster, or use the sapstartsrv-resourceAgent.

Hope it helps.