electric-sql / electric

Sync little subsets of your Postgres data into local apps and services.
https://electric-sql.com
Apache License 2.0
6.31k stars 149 forks source link

Horizontally scalable #1459

Open thruflo opened 3 months ago

thruflo commented 3 months ago

Can run multiple Electric's against a single PG.

Can failover between and load balance across them.

Document and demonstrate.

chunkerchunker commented 3 days ago

Is there an outline of how this works anywhere online? In particular, how would shape IDs be made consistent across servers? Do I understand correctly that shape IDs are currently unique to each server?

Is the (future) intent to base shape IDs and offsets off of the Postgres global txid?

Also, is there a general roadmap for load balancing & failover support?

Thanks!

KyleAMathews commented 2 days ago

You'd need to make resolving shapes sticky to one instance. That's the main trick to keep the shapes consistent. You can now setup multiple Electric instances against a single DB. Load balancing shapes across multiple instances w/ stickiness & failover is now the job of setting up an HTTP proxy.

We're renaming shape_id btw — it's a misnomer https://github.com/electric-sql/electric/issues/1771

chunkerchunker commented 2 days ago

Thank you for the quick reply (and for the cool platform). I saw those, and I've been able to start multiple servers. Sticky connections makes fine sense. But in the case of failover, with the current system, a client would get a must-refetch and would have to resync from scratch.

This is probably a dumb question, but I'll ask anyways: why does the shape handle include the current timestamp

https://github.com/electric-sql/electric/blob/de92ec684ae29eb1b28608a663ae9807583b4261/packages/sync-service/lib/electric/shapes/shape.ex#L45

and why can't the offset just be the current Postgres txid? Would something like that allow for load-balancing without stickiness and failover without requiring resync?

thruflo commented 2 days ago

why can't the offset just be the current Postgres txid? Would something like that allow for load-balancing without stickiness and failover without requiring resync?

Right now we query the source Postgres to populate shape logs on the server and we can't query Postgres at an arbitrary transaction. This does mean if a shape log disappears, clients need to re-sync. This is a trade-off where we gain simplicity in the initial implementation at the cost of re-syncing. With the caveat that re-syncing can be very fast.

We are aware of different implementation strategies that could allow a fresh Electric instance to take over serving a shape without necessitating client re-sync. But these are more complex, so they're not something we're actively working on right now.

KyleAMathews commented 2 days ago

I've been able to start multiple servers

Very cool you got that working!

And yeah, resyncing is a bit annoying but you can think of Electric as a fancy cache — so our aim is to maximize cache hits but there is marginal diminishing returns to handle edge cases like automatic failover if a server crashes.