dalibo / check_patroni

A nagios plugin for patroni.
PostgreSQL License
7 stars 3 forks source link

Standby cluster not reporting any healthy replica. #72

Open MLyssens opened 1 day ago

MLyssens commented 1 day ago

When using the cluster_has_replica function on a standby cluster the check goes critical because it doesn't find healthy replicas:

CLUSTERHASREPLICA CRITICAL - healthy_replica is 0

The patroni API shows 2 replicas:

{
   "members":[
      {
         "name":"patroni-dc02-node1",
         "role":"standby_leader",
         "state":"streaming",
         "api_url":"http://0.0.0.0:8008/patroni",
         "host":"10.0.0.1",
         "port":5432,
         "timeline":5
      },
      {
         "name":"patroni-dc02-node2",
         "role":"replica",
         "state":"streaming",
         "api_url":"http://0.0.0.0:8008/patroni",
         "host":"10.0.0.2",
         "port":5432,
         "timeline":5,
         "lag":0
      },
      {
         "name":"patroni-dc02-node3",
         "role":"replica",
         "state":"streaming",
         "api_url":"http://0.0.0.0:8008/patroni",
         "host":"10.0.0.3",
         "port":5432,
         "timeline":5,
         "lag":0
      }
   ],
   "scope":"patroni-dc02"
}

Also patroni list shows no issues:

+ Cluster: patroni-dc02 (7425528900838155819) ---------+-----------+----+-----------+
| Member             | Host           | Role           | State     | TL | Lag in MB |
+--------------------+----------------+----------------+-----------+----+-----------+
| patroni-dc02-node1 | 10.0.0.1 | Standby Leader | streaming |  5 |           |
| patroni-dc02-node2 | 10.0.0.2 | Replica        | streaming |  5 |         0 |
| patroni-dc02-node3 | 10.0.0.3 | Replica        | streaming |  5 |         0 |
+--------------------+----------------+----------------+-----------+----+-----------+

Any pointers in what I am missing here?

blogh commented 1 day ago

Hi,

I think it's a bug, since I am not checking for a standby leader for the leader timeline ... Could you run it in debug mode please (-vvv)

Thanks for the report.

blogh commented 20 hours ago

I made a PR for this, can you test it ?

MLyssens commented 18 hours ago

Hi, thanks already. I can test this tomorrow. Will let you know the outcome!

Sent from Outlook for Androidhttps://aka.ms/AAb9ysg


From: Benoit @.> Sent: Tuesday, October 15, 2024 3:03:23 PM To: dalibo/check_patroni @.> Cc: MLyssens @.>; Author @.> Subject: Re: [dalibo/check_patroni] Standby cluster not reporting any healthy replica. (Issue #72)

I made a PR for this, can you test it ?

— Reply to this email directly, view it on GitHubhttps://github.com/dalibo/check_patroni/issues/72#issuecomment-2414013669, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGEFN2TQVTC4X6XKW6TTC6DZ3UOCXAVCNFSM6AAAAABP5DE522VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJUGAYTGNRWHE. You are receiving this because you authored the thread.Message ID: @.***>