electric-sql / electric

Sync little subsets of your Postgres data into local apps and services.
https://electric-sql.com
Apache License 2.0
6.19k stars 143 forks source link

Send data out immediately on the write-path #1760

Open balegas opened 4 days ago

balegas commented 4 days ago

When a transactions comes in, the server persist changes to shape logs before sending them to subscribers waiting for new data. This is adding a fair amount of latency to the write path. In this issue we want to stream rows to clients as soon as we find a shape match and persist changes to shape logs in the background.

In order to do that, we're going to buffer row -> [shapeIds] and have a process to persist changes for those shape . We don't ack a transaction from Postgres until all shapes for a transaction have been updated. This allows to recover gracefully from crashes, as the server can continue from where it stopped and just skip shape logs that have alreaby been written.

The buffer has a fixed size (configurable?). Pending transactions are written to disk:

It might happen that Postgres writes to Electric at a faster speed than Electric can handle shape logs. The developer would need to handle that situation by increasing the buffer size, or account for PG WAL size increase.

This task should be done after #1744 as it builds on the assumption that a single process would determine what shapes need to be written (we'd have to revise approach otherwise)

thruflo commented 4 days ago

Just an observation that a client may be able to re-connect after receiving a response within the buffer window. So we should serve new requests from memory where possible as well as currently blocked live requests.

marc-shapiro commented 3 days ago

Two conflicting statements: "persist changes to shape logs in the background" vs "We don't ack a transaction from Postgres until all shapes for a transaction have been updated". In fact, you can ack a transaction as soon as it is persisted, and you can perform shape matching and propagation in a parallel background task that doesn't have to ack. In case of a crash, either the transaction was not persisted hence not ack'ed, and you get it trom the server; or it was and you get it from the persisted log. Again, it helps to have a single on-disk log common to all shapes.

balegas commented 3 days ago

In case of a crash, either the transaction was not persisted hence not ack'ed, and you get it from the server

You can't get the operation from PG WAL once it's acked. We only ack on Pg to be able to recover from the point the server crashed.

Again, it helps to have a single on-disk log common to all shapes

yeah, we can come back to this. The reasoning is that we don't want to scan logs on reads because they are a lot more frequent than writes.

balegas commented 3 days ago

Just an observation that a client may be able to re-connect after receiving a response within the buffer window. So we should serve new requests from memory where possible as well as currently blocked live requests.

Yeah, the issue description is not clear about that. In the original RFC I suggest doing that by scanning the common buffer, or holding a buffer for each shape. That needs to be clarified during implementation.

Well spotted. Thanks for raising it.