electric-sql / electric

Sync little subsets of your Postgres data into local apps and services.
https://electric-sql.com
Apache License 2.0
6.23k stars 148 forks source link

Clean up publication filters when shapes are removed #1774

Open msfstef opened 1 week ago

msfstef commented 1 week ago

Currently the replication publication gets altered when a shape is created (adds a table and potentially a row filter) but no cleanup occurs - so the publication can potentially grow to include everything, even between restarts and deploys, and slow down the system over time.

My suggestion to address this is to change the Electric.Postgres.Configuration to alter the publication based on all active shapes rather than based on each individual one, in that case every call will update the entire publication and its filters as necessary and resuming/cleaning can be a matter of calling this every time a shape is deleted and once upon starting (with recovered shapes or no shapes).

KyleAMathews commented 1 week ago

can be a matter of calling this every time a shape is deleted and once upon starting (with recovered shapes or no shapes).

How much work is it to generate this call? What if 1000s of new shapes are being created every second?

msfstef commented 1 week ago

The cleanups in general can be scheduled, but it is important to note that not cleaning them up also means that we're processing data through the replication stream that we don't need, which also takes up resources - delaying a few requests in order to free up resources is probably still a better option, but we'd need to benchmark that. Cleaning it up at any point is probably the first step.

We could also avoid calling it whenever a shape is deleted and only call it whenever a shape is created like we do now, and the creation call will clean up what is not used (and perhaps trigger periodic cleanups as well rather than on every shape deletion).

For reference we're already doing at least 2 roundtrips with Postgres whenever a shape is created in order to adjust the publication filter and that occurs on the critical path, so I doubt the cleaning up will be a bottleneck.

balegas commented 1 week ago

It's a bit of an orthogonal point to this issue, but we need to re-think how setup filtering on the replication slot.

We cannot set replication filter per shape. We need to be more selective on the WHERE clauses. It will never work to pass filters like WHERE id = x, WHERE id = y (filtering by id) as the number of shapes grow.

If we think at a macro-level, one electric instance needs to be able to handle a certain amount of rows, no matter how many filters you put into the replication slot. If we get to the point that the max tp of electric in not high enough, we need to do shape matching more efficiently.

I'll create an issue for evaluating the performance impact of filtering

KyleAMathews commented 1 week ago

Columns + tables seems sustainable.

balegas commented 1 week ago

yeah subscribing to tables we already do, but we should also drop them from the replication slot when the last shape for that table is gone. Columns + tables will not trigger so frequently.

@msfstef is there a quick route for implementing last shape for table is gone? Otherwise, I think this can be left until we work on https://github.com/electric-sql/electric/issues/1778

msfstef commented 1 week ago

@balegas it would require a bit of rewiring but I think we could do the following:

  1. Boot up Electric and recover any persisted shapes
  2. Build and maintain index reference counting for tables (and in the future perhaps columns as well) 2a. We can also rebuild this every time from the list of shapes, shouldn't be too expensive even with 100k shapes
  3. On every shape addition/removal, update the index and add/remove tables and filters on the publication 3a. This means that we only update the publication if tables and/or columns are added/removed, so most shapes if they select the same columns and operate on the same table with different filters would not affect the publication

Should be a relatively simple solution - and we can leverage an ETS table to avoid going through a single process for this.