disqus / pgshovel

A change data capture system for PostgreSQL
Apache License 2.0
11 stars 3 forks source link

Prevent adding tables with existing content to replication sets. #16

Open tkaemming opened 9 years ago

tkaemming commented 9 years ago

This is a stopgap until #15 to prevent accidentally corrupting the data set on replication targets. If a table has existing content before it is added to the replication set, that initial data set will never be replicated and the consistent snapshot exposed to replication targets will be broken. (Technically, only replication targets that were bootstrapped before the content was added to the table would be corrupted, but it's probably better to just unilaterally prevent the behavior for until a better solution exists.)

This will likely require an exclusive lock on the table during the configuration change, running COUNT(*) on the table, and only allowing the configuration change to continue if there are 0 existing rows.

Fluxx commented 9 years ago

So I want to make sure I understand the issue here. The sequence of events is as follows:

  1. I have an existing table I want to replicate out of my database with PGShovel. This table has existing data in it - let's say my users table.
  2. I setup PGShovel and add users to a replication set, aiming to replicate my changes out of the database.
  3. Once the replication set for the table is setup, users mutations from that point on only are replicated out.
  4. Any mutations that occurred before the replication set was created, as well as the snapshots of any rows that already existed in that table, do no get replicated.

I don't follow this sentence though:

[T]he consistent snapshot exposed to replication targets will be broken. (Technically, only replication targets that were bootstrapped before the content was added to the table would be corrupted, but it's probably better to just unilaterally prevent the behavior for until a better solution exists.)

I feel like I may be conflating two ideas of a "replication set," one dealing with the top-level replication set in PGShovel, as well as the "replication" tooling that is being worked on in the replication branch. So I'm a little confused exactly what this is saying...