Closed victor-sudakov closed 2 years ago
Hi, @victor-sudakov!
As I can see it still support services with health check defined, for example if I register service in this way (with health check provided)
curl 127.1:8500/v1/agent/service/register -XPUT -d '{"name": "example-service","port": 80,"check": { "http": "https://google.com", "interval": "10s"}}'
Then discovery will return list of services
python consul2zabbix.py discovery
{
"data": [
{
"{#SERVICEID}": "example-service"
}
]
}
Could you show your usecase?
Could you show your usecase?
Patroni registers itself with Consul, its service name is "12-test5". It can be discovered from http://127.0.0.1:8500/v1/catalog/services or via the CLI command "consul catalog services"
$ curl -q http://127.0.0.1:8500/v1/catalog/services | jq
{
"12-test5": [
"master"
],
"consul": []
}
and its status can be viewed via http://127.0.0.1:8500/v1/catalog/service/12-test5.
Maybe Patroni registers itself in some non-standard way, but its service is not visible in http://127.0.0.1:8500/v1/health/node/$HOSTNAME. It is this service that I wanted to monitor.
This is what Patroni sends to Consul to register itself:
2020-08-07 02:36:48,145 INFO: Register service 12-test2, params {'service_id': '12-test2/cluster-test2', 'address': '172.31.42.9', 'port': 5432, 'check': {'http': 'http://172.31.42.9:8008/master', 'interval': '5s', 'DeregisterCriticalServiceAfter': '150.0s'}, 'tags': ['master']}
Looks quite good to me.
I think I'm beginning to understand the issue better. The syntax /v1/health/node/FOO
shows only services registered on the node FOO itself. This is probably intentional? But I was going to monitor a cluster of 3 dedicated Consul servers, and all services are never registered on Consul servers themselves, they are registered on Consul clients running on Patroni nodes. So I was seeing an empty list of services on Consul servers.
Can something be done about it? Maybe it would be more productive to discover services with /v1/catalog/services
and then check their health with /v1/health/checks/SERVICENAME
? Thus we could monitor Consul servers, not only endpoint clients.
The syntax /v1/health/node/FOO shows only services registered on the node FOO itself. This is probably intentional?
Yep; main reason for this is: native zabbix mapping in which node service located. And second reason: if you monitor everything by zabbix, why don't you have zabbix on every consul node?
But look like it can be easily changed to monitor entire cluster from only one node. Try https://github.com/dmitriy-myz/zabbix-templates/tree/monitor-from-one-node. Unfortunaly I don't have zabbix so can't check it.
if you monitor everything by zabbix, why don't you have zabbix on every consul node?
I will, but the irony is that zabbix-agents on dedicated consul nodes will show nothing about consul.
But look like it can be easily changed to monitor entire cluster from only one node. Try https://github.com/dmitriy-myz/zabbix-templates/tree/monitor-from-one-node. Unfortunaly I don't have zabbix so can't check it.
Thanks, I'll check on Monday and report.
Hmm,
$ zabbix_get -s test2 -k 'consul2zabbix[nodeStatus]'
Traceback (most recent call last):
File "/opt/scripts/consul2zabbix.py", line 87, in <module>
node = sys.argv[2]
IndexError: list index out of range
What is the consul2zabbix[nodeStatus]
key supposed to return to Zabbix?
What is the
consul2zabbix[nodeStatus]
key supposed to return to Zabbix?
it should return health of consul node. I fixed it to report status of local consul node. You can refactor it to use LLD and report status of all consul nodes if you want.
What is the
consul2zabbix[nodeStatus]
key supposed to return to Zabbix?it should return health of consul node. I fixed it to report status of local consul node. You can refactor it to use LLD and report status of all consul nodes if you want.
In the old version, consul2zabbix.py nodeStatus
returns the status of the local agent and I think it's fine and should remain that way in the new version. Now the new version throws an error:
$ consul2zabbix.old/consul2zabbix.py nodeStatus
1
$ consul2zabbix.new/consul2zabbix.py nodeStatus
Traceback (most recent call last):
File "consul2zabbix.new/consul2zabbix.py", line 87, in <module>
node = sys.argv[2]
IndexError: list index out of range
$
I think it fixed in branch https://github.com/dmitriy-myz/zabbix-templates/tree/monitor-from-one-node. (commit 641852cae41a22a4e1e088807bf8f8cc4cf0ff3f
). Could you try it?
Yes, thank you, I think it works now:
$ consul2zabbix.new1/consul2zabbix.py
discovery
{
"data": [
{
"{#NODENAME}": "cluster-test1",
"{#SERVICEID}": "12-test2/cluster-test1"
}
]
}
$ consul2zabbix.new1/consul2zabbix.py
nodeStatus
1
$ consul2zabbix.new1/consul2zabbix.py status cluster-test1 12-test2/cluster-test1
1
$
It may break perhaps if Consul's node_name != gethostname(), this is possible I think with FQDNs and such.
Hello Dmitriy!
Are you going to revive the consul2zabbix template? The Python script does not work with the modern Consul v1.8.1, perhaps the API has changed significantly.
If you are interested, please let me know, I've done some debugging and can provide more details, though I lack the expertise to provide a complete fix.