dmitriy-myz / zabbix-templates

MIT License
9 stars 5 forks source link

consul2zabbix does not support the modern consul? #9

Closed victor-sudakov closed 2 years ago

victor-sudakov commented 4 years ago

Hello Dmitriy!

Are you going to revive the consul2zabbix template? The Python script does not work with the modern Consul v1.8.1, perhaps the API has changed significantly.

If you are interested, please let me know, I've done some debugging and can provide more details, though I lack the expertise to provide a complete fix.

dmitriy-myz commented 4 years ago

Hi, @victor-sudakov!

As I can see it still support services with health check defined, for example if I register service in this way (with health check provided)

curl 127.1:8500/v1/agent/service/register -XPUT -d '{"name": "example-service","port": 80,"check": { "http": "https://google.com", "interval": "10s"}}'

Then discovery will return list of services

python consul2zabbix.py discovery 
{
    "data": [
        {
            "{#SERVICEID}": "example-service"
        }
    ]
}

Could you show your usecase?

victor-sudakov commented 4 years ago

Could you show your usecase?

Patroni registers itself with Consul, its service name is "12-test5". It can be discovered from http://127.0.0.1:8500/v1/catalog/services or via the CLI command "consul catalog services"

$ curl -q http://127.0.0.1:8500/v1/catalog/services | jq
{
  "12-test5": [
    "master"
  ],
  "consul": []
}

and its status can be viewed via http://127.0.0.1:8500/v1/catalog/service/12-test5.

Maybe Patroni registers itself in some non-standard way, but its service is not visible in http://127.0.0.1:8500/v1/health/node/$HOSTNAME. It is this service that I wanted to monitor.

victor-sudakov commented 4 years ago

This is what Patroni sends to Consul to register itself:

2020-08-07 02:36:48,145 INFO: Register service 12-test2, params {'service_id': '12-test2/cluster-test2', 'address': '172.31.42.9', 'port': 5432, 'check': {'http': 'http://172.31.42.9:8008/master', 'interval': '5s', 'DeregisterCriticalServiceAfter': '150.0s'}, 'tags': ['master']}

Looks quite good to me.

I think I'm beginning to understand the issue better. The syntax /v1/health/node/FOO shows only services registered on the node FOO itself. This is probably intentional? But I was going to monitor a cluster of 3 dedicated Consul servers, and all services are never registered on Consul servers themselves, they are registered on Consul clients running on Patroni nodes. So I was seeing an empty list of services on Consul servers.

Can something be done about it? Maybe it would be more productive to discover services with /v1/catalog/services and then check their health with /v1/health/checks/SERVICENAME ? Thus we could monitor Consul servers, not only endpoint clients.

dmitriy-myz commented 4 years ago

The syntax /v1/health/node/FOO shows only services registered on the node FOO itself. This is probably intentional?

Yep; main reason for this is: native zabbix mapping in which node service located. And second reason: if you monitor everything by zabbix, why don't you have zabbix on every consul node?

But look like it can be easily changed to monitor entire cluster from only one node. Try https://github.com/dmitriy-myz/zabbix-templates/tree/monitor-from-one-node. Unfortunaly I don't have zabbix so can't check it.

victor-sudakov commented 4 years ago

if you monitor everything by zabbix, why don't you have zabbix on every consul node?

I will, but the irony is that zabbix-agents on dedicated consul nodes will show nothing about consul.

But look like it can be easily changed to monitor entire cluster from only one node. Try https://github.com/dmitriy-myz/zabbix-templates/tree/monitor-from-one-node. Unfortunaly I don't have zabbix so can't check it.

Thanks, I'll check on Monday and report.

victor-sudakov commented 4 years ago

Hmm,

$ zabbix_get -s test2 -k 'consul2zabbix[nodeStatus]'                    
Traceback (most recent call last):
  File "/opt/scripts/consul2zabbix.py", line 87, in <module>                                    
    node = sys.argv[2]
IndexError: list index out of range

What is the consul2zabbix[nodeStatus] key supposed to return to Zabbix?

dmitriy-myz commented 4 years ago

What is the consul2zabbix[nodeStatus] key supposed to return to Zabbix?

it should return health of consul node. I fixed it to report status of local consul node. You can refactor it to use LLD and report status of all consul nodes if you want.

victor-sudakov commented 4 years ago

What is the consul2zabbix[nodeStatus] key supposed to return to Zabbix?

it should return health of consul node. I fixed it to report status of local consul node. You can refactor it to use LLD and report status of all consul nodes if you want.

In the old version, consul2zabbix.py nodeStatus returns the status of the local agent and I think it's fine and should remain that way in the new version. Now the new version throws an error:

$ consul2zabbix.old/consul2zabbix.py nodeStatus
1
$ consul2zabbix.new/consul2zabbix.py nodeStatus
Traceback (most recent call last):
  File "consul2zabbix.new/consul2zabbix.py", line 87, in <module>
    node = sys.argv[2]
IndexError: list index out of range
$ 
dmitriy-myz commented 4 years ago

I think it fixed in branch https://github.com/dmitriy-myz/zabbix-templates/tree/monitor-from-one-node. (commit 641852cae41a22a4e1e088807bf8f8cc4cf0ff3f). Could you try it?

victor-sudakov commented 4 years ago

Yes, thank you, I think it works now:

$ consul2zabbix.new1/consul2zabbix.py
discovery
{
    "data": [
        {
            "{#NODENAME}": "cluster-test1",
            "{#SERVICEID}": "12-test2/cluster-test1"
        }
    ]
}
$ consul2zabbix.new1/consul2zabbix.py
nodeStatus
1
$ consul2zabbix.new1/consul2zabbix.py status cluster-test1 12-test2/cluster-test1
1
$ 

It may break perhaps if Consul's node_name != gethostname(), this is possible I think with FQDNs and such.