2ndQuadrant / pglogical

Logical Replication extension for PostgreSQL 17, 16, 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
http://2ndquadrant.com/en/resources/pglogical/
Other
1.01k stars 153 forks source link

Replication slot on provider disappears #333

Open joshuabaird opened 3 years ago

joshuabaird commented 3 years ago

I had a PG9 provider and a PG13 subscriber where the initial data sync was running (for 4days). I noticed pglogical.show_subscription_status on the subscriber transitioned to down and that the replication slot on the provider was gone. The provider began logging this:

2021-08-17 01:24:04 UTC:10.178.81.137(23030):replication@shipment_feedback_service:[6139]:ERROR: replication slot "pgl_shipment37dc60c_provider_thesubscription" does not exist

No relevant logs on the subscriber that I could find.

Typically, dropping the subscription and re-creating it on the subscriber re-creates the replication slot on the provider. This is not happening though, and I don't see any logs that describe why.

What would cause the replication slot on the provider to be deleted and why isn't re-creating the subscription recreating it?

petere commented 3 years ago

Replication slots don't just disappear. So I would look into that. Maybe check if there is some other software running that is trying to manage replication or backups or something related to that that might have its own opinions what replication slots should be there.

joshuabaird commented 3 years ago

Hi @petere - thanks for responding! No other software running, although this is Amazon RDS. But, this is the first time I have ever seen this happen, and also the first time that I have seen create_subscription not re-create the replication slot on the provider.

This initial data load on this table is VERY slow (it's only ~70GB) for some reason as well. I'm considering re-creating the subscription using data_synchronize=false and then trying to use alter_subscription_resynchronize_table to sync it. From what I understand, this may yield better (faster) results.

If not, are there any options to restore the table from a pg_dump and then refresh/re-start sync after that?

eulerto commented 3 years ago

No. But if your database is big (several terabytes), you should consider using pglogical_create_subscriber that creates a logical instance from a base backup. It is faster than a logical clone. It seems it is not an option for you since you're using RDS. I's been some time since I checked RDS interface but maybe they already provide an option to create a logical replica using pglogical_create_subscriber.

martinmarques commented 3 years ago

That will be quite tricky @eulerto given they are on RDS (@joshuabaird didn't mention if provider or subscriber, or both were on RDS). My 2 cents here are that we don't know what's running in RDS, as that's software deployed by Amazon, the same way as it's hard to know if there is something dropping slots there.

joshuabaird commented 3 years ago

Thanks @eulerto and @martinmarques. Correct, we can't use the basebackup option, so I guess we're stuck with having pglogical sync the data for us either using data_synchonize=true or data_synchronize=false and manually syncing the tables.

sreejithvelath commented 2 years ago

@eulerto Any update on this issue. I am facing a similar issue in Azure.