volume profile measurement

onnorom commented 5 years ago

Hi, I am wondering if there's anything i may be missing that need to be enabled in order for me to get gluster_volumeprofile* measurements? I have set the necessary collectors in the /etc/gluster-exporter/gluster-exporter.toml files as below:

[collectors.gluster_volume_profile] name = "gluster_volume_profile" sync-interval = 5 disabled = false

[collectors.gluster_volume_counts] name = "gluster_volume_counts" sync-interval = 5 disabled = false

[collectors.gluster_volume_heal] name = "gluster_volume_heal" sync-interval = 5 disabled = false

However, i don't see any measurement collected with that name.

I do see the following measurements:

gluster_brick_capacity_bytes_total gluster_brick_capacity_free_bytes gluster_brick_capacity_used_bytes gluster_brick_inodes_free gluster_brick_inodes_total gluster_brick_inodes_used gluster_brick_lv_metadata_percent gluster_brick_lv_metadata_size_bytes gluster_brick_lv_percent gluster_brick_lv_size_bytes gluster_brick_up gluster_cpu_percentage gluster_elapsed_time_seconds gluster_memory_percentage gluster_process:gluster_cpu_percentage:avg1h gluster_process:gluster_elapsed_time_seconds:rate5m gluster_process:gluster_memory_percentage:avg1h gluster_resident_memory_bytes gluster_subvol_capacity_total_bytes gluster_subvol_capacity_used_bytes gluster_vg_extent_alloc_count gluster_vg_extent_total_count gluster_virtual_memory_bytes gluster_volume_heal_count gluster_volume_split_brain_heal_count

khalid151 commented 5 years ago

I'm having the same issue.

Profiling is enabled on all volumes and collectors are set for gluster-exporter but still no profile metrics.

$ glusterd -V
glusterfs 4.1.8

Neraud commented 5 years ago

I've had an issue with a similar symptom : #151

To check if it's the same root cause, could you please try to :

Identify your "leader" node (the one with the highest peer UID in gluster pool list)
Set log-level = "debug" in your gluster-exporter.toml
Restart the exporter
Look at the exporter logs

If you're seeing logs like level=debug msg="Error getting profile info" error="exit status 1" volume=[volume_name], it's probably the same issue.

khalid151 commented 5 years ago

Thanks! That was the same issue and it's fixed now.

limiao2008 commented 3 years ago

first start monitoring workload

Start Profiling
You must start the Profiling to view the File Operation information for each brick.

To start profiling, use following command:

# gluster volume profile start

For example, to start profiling on test-volume:

# gluster volume profile test-volume start
Profiling started on test-volume
When profiling on the volume is started, the following additional options are displayed in the Volume Info:

diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on

enable "gluster_volume_profile" in config see config template https://github.com/gluster/gluster-prometheus/blob/master/extras/conf/gluster-exporter.toml.sample

[collectors.gluster_volume_profile]
name = "gluster_volume_profile"
sync-interval = 5
disabled = false

second deploy on all nodes

Because only isleader of the nodes is selected to collect data。finds the peer with the maximum UUID (lexicographically) is leader.

like this peer list，only get Profiling data in the maximum UUID(b2157fd6-4d7a-485e-b21d-1c3785ab3fbd),because it is leader

[root]# gluster pool list
UUID                    Hostname    State
91acc359-eee7-4faf-b47b-692351bd3fd9    192.63.1.19     Connected 
b2157fd6-4d7a-485e-b21d-1c3785ab3fbd    192.63.1.18     Connected 
5a8c3b1f-21e2-4657-baf7-48fe272fcbfc    192.63.1.110    Connected 
57f9c5fa-2dfa-4fc7-912c-619cfb047170    192.63.1.16     Connected 
a0a13141-b402-46ca-97a2-5d3703283626    10.63.1.17  Connected 
13b99272-b7e4-4aee-b3bf-ec8d456c04e8    localhost   Connected

in the code https://github.com/gluster/gluster-prometheus/blob/master/pkg/glusterutils/exporterd.go

// IsLeader returns true or false based on whether the node is the leader of the cluster or not
func (g *GD1) IsLeader() (bool, error) {
    setDefaultConfig(g.config)
    peerList, err := g.Peers()
    if err != nil {
        return false, err
    }
    peerID, err := g.LocalPeerID()
    if err != nil {
        return false, err
    }
    var maxPeerID string
    //This for loop iterates among all the peers and finds the peer with the maximum UUID (lexicographically)
    for i, pr := range peerList {
        if pr.Online {
            if peerList[i].ID > maxPeerID {
                maxPeerID = peerList[i].ID
            }
        }
    }
    //Checks and returns true if maximum peerID is equal to the local peerID
    if maxPeerID == peerID {
        return true, nil
    }
    return false, nil
}

limiao2008 commented 3 years ago

@khalid151 @Neraud @onnorom @csabahenk There's no problem， deploy on all nodes can get volume profile https://github.com/gluster/gluster-prometheus/issues/147#issuecomment-743010344

vast0906 commented 2 years ago

gluster pool list hello I followed your method and did not solve the problem,

Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
auth.allow: *
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on

gluster-exporter.toml

[globals]
gluster-cluster-id = ""
gluster-mgmt = "glusterd"
glusterd-dir = "/var/lib/glusterd"
gluster-binary-path = "gluster"
# If you want to connect to a remote gd1 host, set the variable gd1-remote-host
# However, using a remote host restrict the gluster cli to read-only commands
# The following collectors won't work in remote mode : gluster_volume_counts, gluster_volume_profile 
#gd1-remote-host = "localhost"
gd2-rest-endpoint = "http://localhost:24007"
port = 9713
metrics-path = "/metrics"
log-dir = "/var/log/gluster-exporter"
log-file = "exporter.log"
log-level = "info"
# cache-ttl-in-sec = 0, disables caching
cache-ttl-in-sec = 30
# by default caching is turned off
# to enable caching, add the function-name to 'cache-enabled-funcs' list
# supported functions are,
# 'IsLeader', 'LocalPeerID', 'VolumeInfo'
# 'EnableVolumeProfiling', 'HealInfo', 'Peers',
# 'Snapshots', 'VolumeBrickStatus', 'VolumeProfileInfo'
cache-enabled-funcs = [ 'IsLeader', 'LocalPeerID', 'VolumeInfo' ]

[collectors.gluster_ps]
name = "gluster_ps"
sync-interval = 5
disabled = false

[collectors.gluster_peer_counts]
name = "gluster_peer_counts"
sync-interval = 5
disabled = false

[collectors.gluster_peer_info]
name = "gluster_peer_info"
sync-interval = 5
disabled = false

[collectors.gluster_brick]
name = "gluster_brick"
sync-interval = 5
disabled = false

[collectors.gluster_brick_status]
name = "gluster_brick_status"
sync-interval = 15
disabled = false

[collectors.gluster_volume_counts]
name = "gluster_volume_counts"
sync-interval = 5
disabled = false

[collectors.gluster_volume_status]
name = "gluster_volume_status"
sync-interval = 5
disabled = false

[collectors.gluster_volume_heal]
name = "gluster_volume_heal"
sync-interval = 5
disabled = false

[collectors.gluster_volume_profile]
name = "gluster_volume_profile"
sync-interval = 5
disabled = false

i don't have gluster_thinpoolmetadata* @limiao2008

gluster / gluster-prometheus

volume profile measurement #147

first start monitoring workload

second deploy on all nodes