Open samstokes opened 8 years ago
The initial snapshot does a SELECT * from all tables, which will probably do bad things to the database's cache unless the entire dataset (plus indices) can fit in RAM;
It can be worse, It can lock other transactions. BW do the snapshot like this SELECT row by row from all tables and non-stop. So I have a proposal for BW snapshot: fetch number of rows( for example 5000) send all to kafka, then wait for ACK then fetch next 5000 rows ? OR Can we do the snapshot on a standby server ? What do you think @samstokes ?
It can be worse, It can lock other transactions. BW do the snapshot like this SELECT row by row from all tables and non-stop. So I have a proposal for BW snapshot: fetch number of rows( for example 5000) send all to kafka, then wait for ACK then fetch next 5000 rows ? OR Can we do the snapshot on a standby server ?
Are you sure that the SELECT can block other transactions? My understanding was that Postgres' MVCC design meant that queries running on a consistent snapshot (at the REPEATABLE READ isolation level) would execute without blocking other transactions.
Are you sure that the SELECT can block other transactions? My understanding was that Postgres' MVCC design meant that queries running on a consistent snapshot (at the REPEATABLE READ isolation level) would execute without blocking other transactions.
Yeah, It can not block other transactions according to Doc. But, when I start BW snapshot, other transactions start executing for much much more longer time than usual, some transaction cannot execute.
I guess that makes sense - the long-running SELECT will cause a lot of CPU and I/O, and lots of table bloat if there is high write volume, both of which could slow down other queries.
some transaction cannot execute.
Do you mean that you observe some transactions aborting with an error? Do you see any specific error message, or are they just timing out?
I suppose to support snapshots of large databases we might need to either run the snapshot on a standby like you suggest, or teach Bottled Water how to restore Postgres backups directly.
Unfortunately, according to this blog post, it doesn't look like there's currently a way to have a logical replication consumer like Bottled Water start consuming from a specific point in the past, which we'd need to avoid missing updates after either bootstrapping from a standby server or failing over to a promoted standby.
Right now a few things make it tricky to trial Bottled Water on an existing database taking production traffic:
SELECT *
from all tables, which will probably do bad things to the database's cache unless the entire dataset (plus indices) can fit in RAM;auto.create.topics.enable=false
, you'll need to create the appropriate topics for every table before starting Bottled Water;A related problem is what to do if Bottled Water is running on a primary server that fails, and we promote a standby to become the new primary. To keep the Kafka updates flowing, we probably want to point Bottled Water at the new primary, but:
This encompasses a lot of individual changes we might make (e.g. add a table whitelist, add a
--skip-snapshot
flag, integrate Zookeeper support so Bottled Water can create its own Kafka topics, etc), but it would be useful to have a plan for how this ought to work. Creating this issue to collect discussion about this topic!