Open jamesyu558 opened 3 years ago
I'm not able to reproduce this issue: your example matches the regex we're using to parse quorumtool output.
What's the output of ha_cluster_exporter --version
?
Here it is:
total 18436 -rwxr-xr-x. 1 postgres postgres 9437184 Apr 6 08:37 ha_cluster_exporter-amd64
version 1.2.1+git.1606912430.4fceb77 built with go1.15.5 linux/amd64 2020-12-02T17:30:26+00:00
IF you have a debug module, I should be able to install it and see exactly what happened to this parser error. Please let me know if more information you need from me.. Really appreciate your help!!!
in my environment, I have pacemaker installed as well, together with this prometheus exporter installed for Grafana...
Nope, we don't have a debug module. I guess the best shot you have is to download the sources and run it with a step debugger to inspect what input is being actually fed to the regex here: https://github.com/ClusterLabs/ha_cluster_exporter/blob/4fceb77b3a195bbce12f54e23569a66e20f50bc3/collector/corosync/parser.go#L85-L93
Btw, what corosync version you're using?
hold on let me check
Corosync Cluster Engine, version '2.4.3' Copyright (c) 2006-2009 Red Hat, Inc.
How exactly to debug this on RHEL7? Do you have a specific steps to set it up?
Or modify the source code to print out the variable "quorumToolOutput" from "parseNodeId" when it gets called?
You could clone the project and then use https://github.com/go-delve/delve to debug it, but that assumes some familiarity with the Go language and toolkit!
Thanks...I can figure this out. I let you know soon what value of "quorumToolOutput" is passed over to this function....Thank you again.
Or modify the source code to print out the variable "quorumToolOutput" from "parseNodeId" when it gets called?
yes, you could also do that by adding
log.Debug(string(quorumToolOutput))
after line 85
even better...thx
Will get back to you tomorrow morning this time....
We modified that function like this:
func parseNodeId(quorumToolOutput []byte) (string, error) {
nodeRe := regexp.MustCompile((?m)Node ID:\s+(\w+)
)
matches := nodeRe.FindSubmatch(quorumToolOutput)
var x = string(quorumToolOutput)
if matches == nil {
return "", errors.New("could NOT find Node ID line :" + x)
}
return string(matches[1]), nil
}
Then in the log, we see this: could not parse node id in corosync-quorumtool output: could NOT find Node ID line :"
Notice that we changed "not" to "NOT" in purpose and see if the code can take out changes.... Looks like the x variable is an empty space....
Any more ideas?
Hello, is there any update about this issue?
I need an example output from corosync-quorumtool
to reproduce the issue. That is, an output that doesn't correctly match the (?m)Node ID:\s+(\w+)
regular expression. You can verify that yourself at https://regex101.com/r/riyToT/1.
As you can see, the example provided by OP matches correctly, so I don't know what's up there.
Until I get an actual example, there is not much I can do.
Hello @stefanotorresi i've the same issue, here is the output :
ha_cluster_exporter time="2023-05-02T18:37:54Z" level=warning msg="Corosync Collector scrape failed: could not parse ring id and seq number in corosync-quorumtool output: could not find Ring ID line"
Quorum information
------------------
Date: Tue May 2 18:38:03 2023
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 2
Ring ID: 2.4a46b
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 1
Flags: 2Node Quorate LastManStanding
Membership information
----------------------
Nodeid Votes Name
2 1 lb-int01.xxx.yyy.zzz (local)
3 1 lb-int02.xxx.yyy.zzz
Issue on Debian 11
hmm, ok, that does match the regex, so it's not helping me either: https://regex101.com/r/JuhDCK/1
oh, by the way, please always report the versions of the exporter and corosync you're using.
Here it is :
corosync 3.1.2-2 ha_cluster_exporter-1.0.1
I've just updated to 1.3.2, it seems fixed :thinking:
tl;dr: if that can help anyone, make sure you test running corosync-quorumtool
with same user as the one your ha_cluster_exporter
process runs under and that it does work indeed under that user.
./ha-cluster-exporter --version
ha_cluster_exporter, version 1.3.3+git.1683650163.1000ba6 (branch: HEAD, revision: 1000ba696a5ef85737f70808a12e5a01bee5c281)
build user: runner@fv-az1100-952
build date: 20230529-08:55:18
go version: go1.20.4
platform: linux/amd64
tags: netgo
$ corosync-quorumtool
Cannot initialize CMAP service
In this case (unprivileged user) and I guess in other cases, corosync-quorumtool
exits with exit code 1
which is ignored as per this comment. stdout
is empty hence the failure to find a node ID and stderr
contains that error. The fix here was to make sure the user has the proper permissions for corosync-quorumtool
not to fail.
I guess a possible improvement would be ignoring the return code as is currently done but also failing when stdout
is empty and stderr
is not, since that might indicate failure of the command itself?
failing when stdout is empty and stderr is not, since that might indicate failure of the command itself
That's a good suggestion! We'll see to implement this tweak in the next iteration.
Hi Support,
The following corosync parser error on the "Node ID" exists on the v1.2.0. So I upgraded the ha_cluster_exporter from v1.2.0 to the latest version v.1.2.1 on my RHEL7 VM. But unfortunately, this error still exists on v1.2.1.
The error message is and noticed that the field name complained by corosync is "Node ID": msg="'corosync' collector scrape failed: corosync parser error: could not parse node id in corosync-quorumtool output: could not find Node ID line"
See below: