gluster / gluster-prometheus

Gluster monitoring using Prometheus
GNU Lesser General Public License v2.1
119 stars 70 forks source link

Some collectors do not work #165

Open kbelosevic opened 5 years ago

kbelosevic commented 5 years ago

We are experiencing an issue on our servers where some metrics are not available i.e. "gluster_up". Same issue happens with using gd1 remote host, gd1 socket and gd2 host and in any combination with cache enabled, different TTLs and different caching functions.

Also, log is always empty.

So basically no metric is exported when some collectors are used for example:

[collectors.gluster_brick_status]
name = "gluster_brick_status"
sync-interval = 15
disabled = false
glusterd --version
glusterfs 6.3
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

/usr/sbin/gluster-exporter --version
version   : v0.3-dev.85.git9537825
go version: go1.12.7
go OS/arch: linux/amd64

uname -a
4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Configuration: /etc/gluster-exporter/gluster-exporter.toml

[globals]
gluster-cluster-id = ""
gluster-mgmt = "glusterd"
glusterd-dir = "/var/lib/glusterd"
gluster-binary-path = "gluster"
# If you want to connect to a remote gd1 host, set the variable gd1-remote-host
# However, using a remote host restrict the gluster cli to read-only commands
# The following collectors won't work in remote mode : gluster_volume_counts, gluster_volume_profile 
#gd1-remote-host = "localhost"
gd2-rest-endpoint = "http://localhost:24007"
#gd1-glusterd-sock = "/var/run/glusterd.socket"
port = 9189
metrics-path = "/metrics"
log-dir = "/var/log/gluster-exporter"
log-file = "exporter.log"
log-level = "debug"
# cache-ttl-in-sec = 0, disables caching
# cache-ttl-in-sec = 0
 cache-ttl-in-sec = 10
# by default caching is turned off
# to enable caching, add the function-name to 'cache-enabled-funcs' list
# supported functions are,
# 'IsLeader', 'LocalPeerID', 'VolumeInfo'
# 'EnableVolumeProfiling', 'HealInfo', 'Peers',
# 'Snapshots', 'VolumeBrickStatus', 'VolumeProfileInfo'
cache-enabled-funcs = [ 'IsLeader', 'LocalPeerID', 'VolumeInfo', 'EnableVolumeProfiling', 'HealInfo', 'Peers', 'Snapshots', 'VolumeBrickStatus', 'VolumeProfileInfo']

[collectors.gluster_ps]
name = "gluster_ps"
sync-interval = 5
disabled = false

[collectors.gluster_peer_counts]
name = "gluster_peer_counts"
sync-interval = 5
disabled = false

[collectors.gluster_peer_info]
name = "gluster_peer_info"
sync-interval = 5
disabled = false

[collectors.gluster_brick]
name = "gluster_brick"
sync-interval = 5
disabled = false

[collectors.gluster_brick_status]
name = "gluster_brick_status"
sync-interval = 15
disabled = false

[collectors.gluster_volume_counts]
name = "gluster_volume_counts"
sync-interval = 5
disabled = false

[collectors.gluster_volume_status]
name = "gluster_volume_status"
sync-interval = 5
disabled = false

[collectors.gluster_volume_heal]
name = "gluster_volume_heal"
sync-interval = 5
disabled = false

[collectors.gluster_volume_profile]
name = "gluster_volume_profile"
sync-interval = 5
disabled = false

/usr/lib/systemd/system/gluster-exporter.service

[Unit]
Description=Gluster Prometheus Exporter

[Service]
ExecStart=/usr/sbin/gluster-exporter --config=/etc/gluster-exporter/gluster-exporter.toml
KillMode=process

[Install]
WantedBy=multi-user.target

Only metrics which are shown:

curl -Ss localhost:9189/metrics 2>&1 | grep gluster |grep -v ^# | cut -d \{ -f1|sort|uniq
gluster_brick_capacity_bytes_total
gluster_brick_capacity_free_bytes
gluster_brick_capacity_used_bytes
gluster_brick_inodes_free
gluster_brick_inodes_total
gluster_brick_inodes_used
gluster_cpu_percentage
gluster_elapsed_time_seconds
gluster_memory_percentage
gluster_peer_connected
gluster_peer_count
gluster_peer_status
gluster_pv_count
gluster_resident_memory_bytes
gluster_subvol_capacity_total_bytes
gluster_subvol_capacity_used_bytes
gluster_vg_count
gluster_virtual_memory_bytes
gluster_volume_brick_free_bytes
gluster_volume_brick_free_inodes
gluster_volume_brick_pid
gluster_volume_brick_port
gluster_volume_brick_status
gluster_volume_brick_total_bytes
gluster_volume_brick_total_inodes
gluster_volume_status_brick_count

Gluster info:

gluster peer status
Number of Peers: 2

Hostname: host-1
Uuid: 0be511d4-1e81-4395-a712-e5f5def4c18b
State: Peer in Cluster (Connected)

Hostname: host-3
Uuid: 6baacbc3-240d-419a-aba9-e2f3cba3b40d
State: Peer in Cluster (Connected)
--------------------------------------------------------------
gluster volume info
Volume Name: gv_media
Type: Disperse
Volume ID: 9315f5bb-27a1-4575-8844-506e0bd77f43
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: host-1:/media/gluster/gv_media/brick
Brick2: host-2:/media/gluster/gv_media/brick
Brick3: host-3:/media/gluster/gv_media/brick
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.disable: on
transport.address-family: inet
kbelosevic commented 5 years ago

Any possible insights on this?

aravindavk commented 5 years ago

gluster_up is one of the cluster level metrics which will be exported from one node of the cluster.

Is it not showing in the exporter of other nodes? I see there are three hosts in the cluster.

curl -Ss host-1:9189/metrics 2>&1 | grep gluster_up
curl -Ss host-2:9189/metrics 2>&1 | grep gluster_up
curl -Ss host-3:9189/metrics 2>&1 | grep gluster_up

Please confirm.

kbelosevic commented 5 years ago

It is not exported on any of the nodes.

user@host-1:~$ curl localhost:9189/metrics 2>&1 | grep gluster_up
user@host-1:~$ curl localhost:9189/metrics 2>&1 | wc -l
203
user@host-2:~$ curl localhost:9189/metrics 2>&1 | grep gluster_up
user@host-2:~$ curl localhost:9189/metrics 2>&1 | wc -l
203
user@host-3:~$ curl localhost:9189/metrics 2>&1 | grep gluster_up
user@host-3:~$ curl localhost:9189/metrics 2>&1 | wc -l
213
belfo commented 4 years ago

Seeing exactly the same. did you find why? (i have even less metrics) curl -Ss localhost:9713/metrics 2>&1 | grep gluster |grep -v ^# | cut -d { -f1|sort|uniq gluster_brick_capacity_bytes_total gluster_brick_capacity_free_bytes gluster_brick_capacity_used_bytes gluster_brick_inodes_free gluster_brick_inodes_total gluster_brick_inodes_used gluster_peer_connected gluster_peer_count gluster_peer_status gluster_subvol_capacity_total_bytes gluster_subvol_capacity_used_bytes gluster_volume_brick_free_bytes gluster_volume_brick_free_inodes gluster_volume_brick_pid gluster_volume_brick_port gluster_volume_brick_status gluster_volume_brick_total_bytes gluster_volume_brick_total_inodes gluster_volume_status_brick_count

beakbite commented 4 years ago

I get exactly the same reduced list of mertric as @belfo

Also nothing logged, even with log-level = "debug", log file is created but empty

RHEL 7.4 Glusterfs 3.8.4

benjulios commented 4 years ago

Same issue for me . I have 2 clusters , one is production the other one is dev . Each cluster has 5 nodes including 1 arbiter. On thoses 10 nodes only one of them report the gluster_volume_up metric and many others . All of them were compiled the same way / same commands : Here's my version:

version : v0.3-dev.93.git3ebaacc go version: go1.14.4 go OS/arch: linux/amd64

DEVADM:/root#for SRV in DEV1 DEV2 DEV3 DEV4 ;do ssh $SRV "curl -Ss localhost:9713/metrics 2>&1 | grep gluster |grep -v ^# | cut -d { -f1|sort|uniq|wc -l" ;done 38 64 <<<< this one is reporting more metrics with the absolute same config ! 38 38 PROD:/root#for SRV in PRD1 PRD1 PRD3 PRD4 ;do ssh $SRV "curl -Ss localhost:9713/metrics 2>&1 | grep gluster |grep -v ^# | cut -d { -f1|sort|uniq|wc -l" ;done 38 38 38 38

edit : rhel 7.6 / uname -r => 3.10.0-957.21.3.el7.x86_64

benjulios commented 4 years ago

Ok Its now working or me . You have to install the exporter on ALL your nodes even the arbiter . Because only 1 of them will export 'cluster-wide' metrics . The node in charge is selected from the peer list , this means even arbiters can be selected

Here's the number of metrics exported from each nodes on my 2 clusters PROD and DEV( one loop per cluster ): You will notice that only one node per cluster export more metrics

root@adminserver:/root#for SRV in SRVDEV1 SRVDEV2 SRVDEV3 SRVDEV4 ARBITERDEV1 ;do ssh $SRV "curl -Ss localhost:9713/metrics 2>&1 | grep gluster |grep -v ^# | cut -d { -f1|sort|uniq|wc -l" ;done 38 64 38 38 38 root@adminserver:/root#for SRV in SRVPROD1 SRVPROD2 SRVPROD3 SRVPROD4 ARBITERPRD1 ;do ssh $SRV "curl -Ss localhost:9713/metrics 2>&1 | grep gluster |grep -v ^# | cut -d { -f1|sort|uniq|wc -l" ;done 38 38 38 38 64

Not sure if it's also your case but i hope this will help

limiao2008 commented 3 years ago

I have same problem . I solved it. see https://github.com/gluster/gluster-prometheus/issues/147#issuecomment-743010344