Node atttributes and Systemd units data not showing up in Grafana

pvaldria commented 3 years ago

Node atttributes and Systemd units data not showing up in Grafana. Please see attached screenshot. Is it a known issue ? I have a pacemaker/corosync NFS HA cluster (active/passive) with shared disk and using SBD fencing agent.

I had to add the below to /etc/prometheus/prometheus.yml `

job_name: 'nfs-ha' scrape_interval: 5s static_configs:
- targets: ['nfs-server-1.storage.nfs.oraclevcn.com:9664', 'nfs-server-2.storage.nfs.oraclevcn.com:9664', 'qdevice.storage.nfs.oraclevcn.com:9664', 'nfs-server-1.storage.nfs.oraclevcn.com:9100', 'nfs-server-2.storage.nfs.oraclevcn.com:9100', 'qdevice.storage.nfs.oraclevcn.com:9100'] labels: group: 'nfs-ha' `

I installed ha_cluster_exporter using the steps below.

` yum install -y -q git curl -O https://objectstorage.us-ashburn-1.oraclecloud.com/xxxxxxxxxxxxxxx/go1.15.8.linux-amd64.tar.gz tar -C /usr/local -xzf go1.15.8.linux-amd64.tar.gz

echo ' export GOROOT="/usr/local/go" export GOBIN="$HOME/go/bin" mkdir -p $GOBIN export PATH=$PATH:$GOROOT/bin:$GOBIN ' >> .bashrc source ~/.bashrc go version go get github.com/golang/mock/mockgen

git clone https://github.com/ClusterLabs/ha_cluster_exporter cd ha_cluster_exporter make make install

cat > /lib/systemd/system/ha_cluster_exporter.service << EOF [Unit] Description=Prometheus exporter for Pacemaker HA clusters metrics After=network.target

[Service] Type=simple Restart=always ExecStart=/root/go/bin/ha_cluster_exporter ExecReload=/bin/kill -HUP $MAINPID Restart=on-failure RestartSec=5s [Install] WantedBy=multi-user.target EOF

systemctl start ha_cluster_exporter `

pvaldria commented 3 years ago

More details:

Feb 7 12:15:10 nfs-server-2 systemd: Started Prometheus exporter for Pacemaker HA clusters metrics.

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=warning msg="Config File \"ha_cluster_exporter\" Not Foun d in \"[/ /.config /etc /usr/etc]\""

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=info msg="Default config values will be used"

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=warning msg="Registration failure: could not initialize ' drbd' collector: '/sbin/drbdsetup' does not exist"

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=info msg="'pacemaker' collector registered."

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=info msg="'corosync' collector registered."

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=info msg="'sbd' collector registered."

Feb 7 12:15:10 nfs-server-2 ha_cluster_exporter: time="2021-02-07T12:15:10Z" level=info msg="Serving metrics on 0.0.0.0:9664"

diegoakechi commented 3 years ago

Hi @pvaldria,

Systemd and other OS related metrics are provided by the Prometheus Node_exporter. Do you have it running on your system too? The ha_cluster_exporter is specialized to provide Clusterlabs components metrics.

pvaldria commented 3 years ago

yes, I have the node_exporter service running on all nodes and on the Grafana/Prometheus server, I have the following:

The last job below ( - job_name: 'nfs-ha-cluster') I added for displaying HA details and I mentioned both port 9664 and port 9100.

` /etc/prometheus/prometheus.yml global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. external_labels: region: region monitor: infrastructure replica: nfs-20210209-0706

alerting: alertmanagers:

static_configs:
- targets:

rule_files:

scrape_configs:

job_name: 'prometheus'

static_configs:
- targets: ['localhost:9090']
job_name: 'nfs_servers'

scrape_interval: 5s static_configs:
- targets: ['nfs-server-1.storage.nfs.oraclevcn.com:9100', 'nfs-server-2.storage.nfs.oraclevcn.com:9100'] labels: group: 'nfs_servers'
job_name: 'quorum'

scrape_interval: 5s static_configs:
- targets: ['qdevice.storage.nfs.oraclevcn.com:9100'] labels: group: 'quorum'
job_name: 'nfs-ha-cluster'

scrape_interval: 5s static_configs:
- targets: ['nfs-server-1.storage.nfs.oraclevcn.com:9664', 'nfs-server-2.storage.nfs.oraclevcn.com:9664', 'qdevice.storage.nfs.oraclevcn.com:9664', 'nfs-server-1.storage.nfs.oraclevcn.com:9100', 'nfs-server-2.storage.nfs.oraclevcn.com:9100', 'qdevice.storage.nfs.oraclevcn.com:9100'] labels: group: 'nfs-ha-cluster' `

diegoakechi commented 3 years ago

@pvaldria another check: Did you enable systemd collector on your node_exporter configuration? It comes disabled by default.

https://github.com/prometheus/node_exporter#disabled-by-default

ClusterLabs / ha_cluster_exporter

Node atttributes and Systemd units data not showing up in Grafana #183