2ndQuadrant / pglogical

Logical Replication extension for PostgreSQL 17, 16, 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
http://2ndquadrant.com/en/resources/pglogical/
Other
1.01k stars 153 forks source link

`pglogical.show_subscription_status` reports `replicating` when provider node is down #20

Closed juxtin closed 8 years ago

juxtin commented 8 years ago

I have a test environment with two nodes (on separate VMs), let's call them "Primary" and "Replica". Not surprisingly, Primary is a provider and Replica is a subscriber.

Under normal circumstances, there are no problems at all. When I check show_subscription_status(), I see what I would expect to see:

select subscription_name, status FROM pglogical.show_subscription_status();

 subscription_name |   status
-------------------+-------------
 subscription1     | replicating

And indeed, if I gracefully shut down Primary, this also has the expected result on Replica:

select subscription_name, status FROM pglogical.show_subscription_status();

 subscription_name |   status
-------------------+-------------
 subscription1     | down

However, if I abruptly pause/kill the Primary VM without allowing it to stop postgres, Replica still thinks everything is ok:

select subscription_name, status FROM pglogical.show_subscription_status();

 subscription_name |   status
-------------------+-------------
 subscription1     | replicating

Is there any way to tell that there's a failure under conditions like that? I'm thinking of real-world issues like power failures and network partitions that affect provider nodes but not subscriber nodes.

PJMODOS commented 8 years ago

That's probably TCP still thinking that connection is active. There is some work being done on this. In meantime you can specify the tcp parameters as part of the connection string as described in https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS

juxtin commented 8 years ago

Thanks for your attention. Adjusting the TCP settings did, in fact, fix the issue.