Closed ab-mohamed closed 3 years ago
hmm there should be 4 nodes in total up for netweaver, 2 of which are clustered. Is node exporter running in the PAS and AAS nodes?
@stefanotorresi I did not check it before destroying the environment.
What should be the used command to check the node exporter on PAS and AAS?
@stefanotorresi, the exporter
systemd service works on PAS
and AAS
nodes in addition to the ASCS
and ERS
nodes:
dev-demo1-netweaver01:~ # systemctl status prometheus-node_exporter.service
● prometheus-node_exporter.service - Prometheus exporter for machine metrics
Loaded: loaded (/usr/lib/systemd/system/prometheus-node_exporter.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2021-10-20 11:51:18 UTC; 2h 21min ago
Docs: https://github.com/prometheus/node_exporter
Main PID: 3763 (node_exporter)
Tasks: 9
CGroup: /system.slice/prometheus-node_exporter.service
└─3763 /usr/bin/node_exporter --collector.systemd --no-collector.mdadm
Oct 20 11:51:18 dev-demo1-netweaver01 node_exporter[3763]: level=info ts=2021-10-20T11:51:18.816Z caller=node_exporter.go:113 collector=thermal_zone
Oct 20 11:51:18 dev-demo1-netweaver01 node_exporter[3763]: level=info ts=2021-10-20T11:51:18.816Z caller=node_exporter.go:113 collector=time
Oct 20 11:51:18 dev-demo1-netweaver01 node_exporter[3763]: level=info ts=2021-10-20T11:51:18.816Z caller=node_exporter.go:113 collector=timex
Oct 20 11:51:18 dev-demo1-netweaver01 node_exporter[3763]: level=info ts=2021-10-20T11:51:18.816Z caller=node_exporter.go:113 collector=udp_queues
Oct 20 11:51:18 dev-demo1-netweaver01 node_exporter[3763]: level=info ts=2021-10-20T11:51:18.816Z caller=node_exporter.go:113 collector=uname
Oct 20 11:51:18 dev-demo1-netweaver01 node_exporter[3763]: level=info ts=2021-10-20T11:51:18.816Z caller=node_exporter.go:113 collector=vmstat
Oct 20 11:51:18 dev-demo1-netweaver01 node_exporter[3763]: level=info ts=2021-10-20T11:51:18.816Z caller=node_exporter.go:113 collector=xfs
Oct 20 11:51:18 dev-demo1-netweaver01 node_exporter[3763]: level=info ts=2021-10-20T11:51:18.816Z caller=node_exporter.go:113 collector=zfs
Oct 20 11:51:18 dev-demo1-netweaver01 node_exporter[3763]: level=info ts=2021-10-20T11:51:18.816Z caller=node_exporter.go:195 msg="Listening on" address=:9100
Oct 20 11:51:18 dev-demo1-netweaver01 node_exporter[3763]: level=info ts=2021-10-20T11:51:18.817Z caller=tls_config.go:191 msg="TLS is disabled." http2=false
dev-demo1-netweaver03:~ # systemctl status prometheus-node_exporter.service
● prometheus-node_exporter.service - Prometheus exporter for machine metrics
Loaded: loaded (/usr/lib/systemd/system/prometheus-node_exporter.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2021-10-20 11:50:20 UTC; 2h 22min ago
Docs: https://github.com/prometheus/node_exporter
Main PID: 3887 (node_exporter)
Tasks: 7
CGroup: /system.slice/prometheus-node_exporter.service
└─3887 /usr/bin/node_exporter --collector.systemd --no-collector.mdadm
Oct 20 11:50:20 dev-demo1-netweaver03 node_exporter[3887]: level=info ts=2021-10-20T11:50:20.792Z caller=node_exporter.go:113 collector=thermal_zone
Oct 20 11:50:20 dev-demo1-netweaver03 node_exporter[3887]: level=info ts=2021-10-20T11:50:20.792Z caller=node_exporter.go:113 collector=time
Oct 20 11:50:20 dev-demo1-netweaver03 node_exporter[3887]: level=info ts=2021-10-20T11:50:20.792Z caller=node_exporter.go:113 collector=timex
Oct 20 11:50:20 dev-demo1-netweaver03 node_exporter[3887]: level=info ts=2021-10-20T11:50:20.792Z caller=node_exporter.go:113 collector=udp_queues
Oct 20 11:50:20 dev-demo1-netweaver03 node_exporter[3887]: level=info ts=2021-10-20T11:50:20.792Z caller=node_exporter.go:113 collector=uname
Oct 20 11:50:20 dev-demo1-netweaver03 node_exporter[3887]: level=info ts=2021-10-20T11:50:20.792Z caller=node_exporter.go:113 collector=vmstat
Oct 20 11:50:20 dev-demo1-netweaver03 node_exporter[3887]: level=info ts=2021-10-20T11:50:20.792Z caller=node_exporter.go:113 collector=xfs
Oct 20 11:50:20 dev-demo1-netweaver03 node_exporter[3887]: level=info ts=2021-10-20T11:50:20.792Z caller=node_exporter.go:113 collector=zfs
Oct 20 11:50:20 dev-demo1-netweaver03 node_exporter[3887]: level=info ts=2021-10-20T11:50:20.792Z caller=node_exporter.go:195 msg="Listening on" address=:9100
Oct 20 11:50:20 dev-demo1-netweaver03 node_exporter[3887]: level=info ts=2021-10-20T11:50:20.792Z caller=tls_config.go:191 msg="TLS is disabled." http2=false
dev-demo1-netweaver01:~ # ps -ef | grep exporter
root 1182 1073 0 14:11 pts/0 00:00:00 grep --color=auto exporter
prometh+ 3763 1 2 11:51 ? 00:03:13 /usr/bin/node_exporter --collector.systemd --no-collector.mdadm
root 18659 1 0 12:09 ? 00:00:43 /usr/bin/ha_cluster_exporter
root 19279 1 0 12:09 ? 00:00:11 /usr/bin/sap_host_exporter --config /etc/sap_host_exporter/HA1_ASCS00.yaml
dev-demo1-netweaver03:~ # ps -ef | grep exporter
prometh+ 3887 1 0 11:50 ? 00:00:00 /usr/bin/node_exporter --collector.systemd --no-collector.mdadm
root 24663 1 0 12:52 ? 00:00:00 /usr/bin/sap_host_exporter --config /etc/sap_host_exporter/HA1_PAS01.yaml
root 31498 31416 0 14:11 pts/0 00:00:00 grep --color=auto exporter
@ab-mohamed Which branch are you running?
We had a few changes to the monitoring setup in develop
lately.
@yeoldegrove the current ‘master’ branch.
@ab-mohamed can you confirm that this is fixed in develop
?
@yeoldegrove IS this the correct repo for the develop
branch?
ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:/ha-clustering:/sap-deployments:/devel/"
@yeoldegrove Using ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:/ha-clustering:/sap-deployments:/devel/"
, I can see that all NetWeaver nodes are up and running:
Can you please apply this fix to the master
brunch?
@yeoldegrove I can't see this fix in the master
branch. I am still seeing the same reported in https://github.com/SUSE/ha-sap-terraform-deployments/issues/781#issue-1031259562.
Used cloud platform GCP
Used SLES4SAP version SLES15SP2
Used client machine OS Google Cloud Shell
Expected behavior vs observed behavior Expected behavior: When NetWeaver 7.5 HA cluster is deployed successfully, the
ClusterLabs HA Multi-Cluster Overview
dashboard should shows 100% Up nodes for NetWeaver HA cluster nodes.Observed behavior: It shows 50% Up nodes for NetWeaver HA cluster nodes.
How to reproduce
20.10.2021 10:43:07 GetProcessList OK name, description, dispstatus, textstatus, starttime, elapsedtime, pid disp+work, Dispatcher, GREEN, Running, 2021 10 20 09:01:22, 1:41:45, 17306 igswd_mt, IGS Watchdog, GREEN, Running, 2021 10 20 09:01:22, 1:41:45, 17307 gwrd, Gateway, GREEN, Running, 2021 10 20 09:01:23, 1:41:44, 17311 icman, ICM, GREEN, Running, 2021 10 20 09:01:23, 1:41:44, 17312